Project Euler #2 Python 3.5 Help on Latency - python

I'm new to coding and trying to do the project euler exercises to improve my knowledge on coding. I have come across several solutions with regards to Project Euler #2.
However, I would want to know why my code takes so much longer to compute as compared to a solution I found.
I would appreciate if anyone can guide me as to the differences between the two.
My code:
def fib(n):
if n==0:
return 0
elif n == 1:
return 1
else:
f=fib(n-1)+fib(n-2)
return f
i=0
store=[]
while fib(i)<=4000000:
i += 1
if fib(i)%2 == 0:
store.append(fib(i))
print('The total is: '+str(sum(store)))
Online Solution I found:
a = 1
b = 2
s = 0
while b <= 4000000:
if not b % 2:
s += b
a, b = b, a + b
print(s)

To calculated fib(10), with your implementation:
fib(10) = fib(9) + fib(8)
in which fib(9) is calculated recursively:
fib(9) = fib(8) + fib(7)
See the problem? The result of fib(8) has to be calculated twice! To further expand the expression (e.g, to get the result of fib(8)), the redundant calculation is huge when the number is big.
Recursion itself isn't the problem, but you have to store the result of smaller fibonacci numbers rather than calculating the same expression on and on. One possible solution is to use a dictionary to store the intermediate result.

You are using recursive calls to a function where the other solution uses a plain iterative loop.
Making a function call is bound to some overhead for calling and returning from it. For bigger numbers of n you will have a lot of those function calls.
Appending to a list over and over and summing it up is probably also slower than doing this via an accumulator.

Your solution calls a recursive function (with 2 recursions) each time it goes in your while loop. Then in the loop you run that same function again.
The other solution only adds numbers and then does a permutation.
I guess you didn't really need the fibonacci, but if you insist on using it, run it only once and save the result, instead of re-runing it.
Plus you store all your results and sum it at the end. That consumes a bit of time (not only) too, maybe you didn't need to store intermediate results.

As several other answers pointed out, the recursion causes your fib() function to be called very often, 111 561 532 times in fact. This is easily seen by adding a counter:
count = 0
def fib(n):
global count
count += 1
if n==0:
# the rest of your program
print(count)
There are two ways to fix this; rewrite your program to be iterative rather than recursive (like the other solution you posted), or cache intermediate results from fib().
See, you call fib(8), which in turn has to call fib(7) and fib(6), etc, etc. Just calculating fib(8) takes 67 calls to fib()!
But later, when you call fib(9), that also calls fib(8), which has to do all the work over again (67 more calls to fib()). This gets out of hand quickly. It would be better, if fib() could remember that it already calculated fib(8) and remember the result. This is known as caching or memoization.
Luckily, Python's standard library has a decorator just for that purpose, functools.lru_cache:
from functools import lru_cache
#lru_cache()
def fib(n):
if n==0:
...
On my computer, your program execution goes from 111 561 532 invocations of fib() in 27 seconds to 35 invocations in 0.028 seconds.

Related

What is the advantage of generator in terms of memory in these two examples?

One of the generator advantage is that it uses less memory and consumes fewer resources. That is, we do not produce all the data at once and we do not allocate memory to all of them, and only a one value is generated each time. The state and status and values ​​of the variables are stored, and in fact the code can be stopped and resumed by calling it to continue.
I wrote two codes and I am comparing them, I see that the generator can be written normally and now I do not see any points for the generator. Can anyone tell me what is the advantage of this generator in compare to when it be written normally? One value is generated with each iteration of both of them.
The first code:
def gen(n):
for i in range(n):
i = i ** 2
i += 1
yield i
g = gen(3)
for i in g:
print(i)
The second one:
def func(i):
i = i ** 2
i += 1
return i
for i in range(3):
print(func(i))
I know that the id of g is constant whereas the id of func(i) is changing.
Is that what the main generator advantage means?
To be specific about the above codes that you have mentioned in the question, there is no difference in terms of memory between the two approaches you have shown, but first one is more preferable because everything you need is inside the same generator function, whereas in the second case, the loop and the function are at two different places, and every time you need to use the second function, you need to use the loop outside which unnecessarily increases the redundancy.
Actually the two functions you have written, the generator one, and the normal function, they are not equivalent.
In the generator, you are returning all the values, i.e. the loop is inside the generator function:
def gen(n):
for i in range(n):
i = i ** 2
i += 1
yield i
But, in the second case, you are just returning one value, and the loop is outside the function:
def func(i):
i = i ** 2
i += 1
return i
In order to make the second function equivalent to the first one, you need to have the loop inside the function:
def func(n):
for i in range(n):
i = i ** 2
i += 1
return i
Now, of course the above function always return a single value for i=0 if control goes inside the loop, so to fix this, you need to return an entire sequence, which demands you to have a list or similar data structure that allows you to store multiple values:
def func(n):
result = []
for i in range(n):
i = i ** 2
i += 1
result.append(i)
return result
for v in func(3):
print(v)
1
2
5
Now, you can clearly differentiate the two cases, in the first one, each values are evaluated sequentially and processed later i.e. printed, but in the second case, you ended up having the entire result in memory before you can actually process it.
The main advantage is when you have a large dataset. It is basically the idea of lazy loading which means that a data is not called unless it is required. This saves your resources because typically in a list, the entire thing is loaded at once which might take up a lot of primary memory if the data is large enough.
The advantage of the first code is with respect to something you did not show. What is meant that generating and consuming one value at a time takes less memory than first generating all values, collecting them in a list, and then consuming them from the list.
The second code with which to compare the first code should have been:
def gen2(n):
result = []
for i in range(n):
i = i ** 2
i += 1
result.append(i)
return result
g = gen2(3)
for i in g:
print(i)
Note how the result of gen2 can be used exactly like the result of gen from your first example, but gen2 uses more memory if n is getting larger, whereas gen uses the same amount of memory no matter how large n is.

Python : Counting execution of recursive call

I am using Euler problems to test my understanding as I learn Python 3.x. After I cobble together a working solution to each problem, I find the posted solutions very illuminating and I can "absorb" new ideas after I have struggled myself. I am working on Euler 024 and I am trying a recursive approach. Now, in no ways do I believe my approach is the most efficient or most elegant, however, I successfully generate a full set of permutations, increasing in value (because I start with a sorted tuple) - which is one of the outputs I want. In addition, in order to find the millionth in the list (which is the other output I want, but can't yet get) I am trying to count how many there are each time I create a permutation and that's where I get stuck. In other words what I want to do is count the number of recursive calls each time I reach the base case, i.e. a completed permutation, not the total number of recursive calls. I have found on StackOverflow some very clear examples of counting number of executions of recursive calls but I am having no luck applying the idea to my code. Essentially my problems in my attempts so far are about "passing back" the count of the "completed" permutation using a return statement. I think I need to do that because the way my for loop creates the "stem" and "tail" tuples. At a high level, either I can't get the counter to increment (so it always comes out as "1" or "5") or the "nested return" just terminates the code after the first permutation is found, depending on where I place the return. Can anyone help insert the counting into my code?
First the "counting" code I found in SO that I am trying to use:
def recur(n, count=0):
if n == 0:
return "Finished count %s" % count
return recur(n-1, count+1)
print(recur(15))
Next is my permutation code with no counting in it. I have tried lots of approaches, but none of them work. So the following has no "counting" in it, just a comment at which point in the code I believe the counter needs to be incremented.
#
# euler 024 : Lexicographic permutations
#
import time
startTime= time.time()
#
def splitList(listStem,listTail):
for idx in range(0,len(listTail)):
tempStem =((listStem) + (listTail[idx],))
tempTail = ((listTail[:idx]) + (listTail[1+idx:]))
splitList(tempStem,tempTail)
if len(listTail) ==0:
#
# I want to increment counter only when I am here
#
print("listStem=",listStem,"listTail=",listTail)
#
inStem = ()
#inTail = ("0","1","2","3","4","5","6","7","8","9")
inTail = ("0","1","2","3")
testStem = ("0","1")
testTail = ("2","3","4","5")
splitList(inStem,inTail)
#
print('Code execution duration : ',time.time() - startTime,' seconds')
Thanks in advance,
Clive
Since it seems you've understood the basic problem but just want to understand how the recursion is happening, all you need to do is pass a variable that tells you at what point of the call stack you're in. You can add a 3rd argument to your function, and increment it with each recursive call:
def splitList(listStem, listTail, count):
for idx in range(0,len(listTail)):
...
splitList(tempStem, tempTail, count)
if len(listTail) == 0:
count[0] += 1
print('Count:', count)
...
Now, call this function like this (same as before):
splitList(inStem, inTail, [0])
Why don't you write generator for this?
Then you can just stop on nth item ("drop while i < n").
Mine solution is using itertools, but you can use your own permutations generator. Just yield next sequence member instead of printing it.
from itertools import permutations as perm, dropwhile as dw
print(''.join(dw(
lambda x: x[0]<1000000,
enumerate(perm('0123456789'),1)
).__next__()[1]))

Time and Space Complexity Trouble

I've seen so many time complexity problems but none seem to aid in my understanding of it - like really get it.
What I have taken from my readings and attempts at practices all seems to come down to what was mentioned here Determining complexity for recursive functions (Big O notation) in the answer coder gave - which in fact did help me understand a little more about what's going on for time complexity.
What about a function that such as this:
def f(n):
if n < 3:
return n
if n >= 3:
return f(n-1) + 2*f(n-2) + 3*f(n-3)
Since the function calls the function 3 times, does that mean that the time complexity is O(3^n)?
As for the space complexity, it seems to be linear hence I propose the complexity to be O(n).
Am I wrong about this?
Since the function calls the function 3 times
This isn't really correct, but rather lets use examples that are more exact that your ad-hoc example.
def constant(n):
return n*12301230
This will always run in the same amount of time and is therefore O(1)
def linear(n):
total = 0
for x in xrange(n):
total+=1
return total
This has O(N) time
def quadratic(n):
total = 0
for x in xrange(n):
for y in xrange(n):
total+=1
return total
This runs in quadratic time O(N^2) since the inner loop runs n times and the outer loop runs n times.
There are also more specific examples for log(N), N*log(N), (2^N), etc but, going back to your question:
Since the function calls the function 3 times, does that mean that the time complexity is O(3^n)?
If the function is called 3 times, it will still be constant time for constant(x), linear for linear(x) and quadratic for quadratic(x). Importantly, O(3^n) is exponential time and is not the same as n^3. Then, we would not use a 3 as the base but a 2^n as a standard.
So your function will have a constant time for x<3. Best approximation to what your function gives, I'd run it through a timer but its recursive and difficult to compute. If you provide another, non-recursive example I'll be happy to tell you its complexity.
Hope this helps a bit, the graph doesn't do justice to how much faster 2^n grows in comparison to n^2 but it's a good start.

Why does backward recursion execute faster than forward recursion in python

I made an algorithm in Python for counting the number of ways of getting an amount of money with different coin denominations:
#measure
def countChange(n, coin_list):
maxIndex = len(coin_list)
def count(n, current_index):
if n>0 and maxIndex>current_index:
c = 0
current = coin_list[current_index]
max_coeff = int(n/current)
for coeff in range(max_coeff+1):
c+=count(n-coeff*current, current_index+1)
elif n==0: return 1
else: return 0
return c
return count(n, 0)
My algorithm uses an index to get a coin denomination and, as you can see, my index is increasing in each stack frame I get in. I realized that the algorithm could be written in this way also:
#measure
def countChange2(n, coin_list):
maxIndex = len(coin_list)
def count(n, current_index):
if n>0 and 0<=current_index:
c = 0
current = coin_list[current_index]
max_coeff = int(n/current)
for coeff in range(max_coeff+1):
c+=count(n-coeff*current, current_index-1)
elif n==0: return 1
else: return 0
return c
return count(n, maxIndex-1)
This time, the index is decreasing each stack frame I get in. I compared the execution time of the functions and I got a very noteworthy difference:
print(countChange(30, range(1, 31)))
print(countChange2(30, range(1, 31)))
>> Call to countChange took 0.9956174254208345 secods.
>> Call to countChange2 took 0.037631815734429974 secods.
Why is there a great difference in the execution times of the algorithms if I'm not even caching the results? Why does the increasing order of the index affect this execution time?
This doesn't really have anything to do with dynamic programming, as I understand it. Just reversing the indices shouldn't make something "dynamic".
What's happening is that the algorithm is input sensitive. Try feeding the input in reversed order. For example,
print(countChange(30, list(reversed(range(1, 31)))))
print(countChange2(30, list(reversed(range(1, 31)))))
Just as some sorting algorithms are extremely fast with already sorted data and very slow with reversed data, you've got that kind of algorithm here.
In the case where the input is increasing, countChange needs a lot more iterations to arrive at its final answer, and thus seems a lot slower. However, when the input is decreasing, the performance characteristics are reversed.
thre number combinations are not huge
the reason is that going forward you have to explore every possibility, however when you go backwards you can eliminate large chunks of invalid solutions without having to actually calculate them
going forward you call count 500k times
going backwards your code only makes 30k calls to count ...
you can make both of these faster by memoizing the calls , (or changing your algorithm to not make duplicate calls)

Why am I exceeding max recursion depth?

Let me stop you right there, I already know you can adjust the maximum allowed depth.
But I would think this function, designed to calculate the nth Fibonacci number, would not exceed it, owing to the attempted memoization.
What am I missing here?
def fib(x, cache={1:0,2:1}):
if x is not 1 and x is not 2 and x not in cache: cache[x] = fib(x-1) + fib(x-2)
return cache[x]
The problem here is the one that tdelaney pointed out in a comment.
You are filling the cache in backward, from x down to 2.
That is sufficient to ensure that you only perform a linear number of recursive calls. The first call to fib(4000) only makes 3998 recursive calls.
But 3998 > sys.getcursionlimit(), so that doesn't help.
Your code works, just set the recursion limit (default is 1000):
>>> def fib(x, cache={1:0,2:1}):
... if x is not 1 and x is not 2 and x not in cache: cache[x] = fib(x-1) + f
ib(x-2)
... return cache[x]
...
>>> from sys import setrecursionlimit
>>> setrecursionlimit(4001)
>>> fib(4000)
24665411055943750739295700920408683043621329657331084855778701271654158540392715
48090034103786310930146677221724629877922534738171673991711165681180811514457211
13771400656054018493704811431159158792987298892998378107544456316501964164304630
21568595514449785504918067352892206292173283858530346012173429628868997174476215
95754737778371797011268738657294932351901755682732067943003555687894170965511472
22394287423465133129791428666544293424932758353804445807459873383767095726534051
03186366562265469193320676382408395686924657068094675464095820220760924728356005
27753139995364477320639625889904027436038223654786222515006804845418392308019640
53848249082837958012652040193422565794818023898141209364892225521425081077545093
40549694342959926058170589410813569880167004050051440392247460055993434072332526
101572422443738016276258104875526626L
>>>
The reason is, if you imagine a large tree, your root node is 4000, which connects to 3999 and 3998. you go all the way down one branch of the tree until you hit a base case. Then you come back up building the cache from the bottom. So the tree is over 1000 deep which is why you hit the limit.
To add to the discussion question comments, wanted to summarize:
You're adding to the cache after the recursive step -- thus your cache isn't doing much.
You're also referring to the same cache value in all the calls. Not sure if that's what you want, but that's the behavior.
This style of recursion isn't idiomatic Python. However, what is idiomatic Python is to use something like a memoization decorator. For an example, look here: https://wiki.python.org/moin/PythonDecoratorLibrary#Memoize (With your exact example)
Maybe this helps to visualise, what is going wrong:
def fib(x, cache={0:..., 1:0, 2:1}):
if x not in cache: cache[x] = fib(x-1) + fib(x-2)
return cache[x]
for n in range(4000): fib(n)
print(fib(4000))
Works perfectly as you explicitely build the cache bottom up. (It is a good thing that default arguments are not evaluated at runtime.)
Btw: your initial dictionary is wrong. fib (1) is 1 and not 0. I kept this numbering offset in my approach, though.
The trick to making memoization work well for a problem like this is to start at the first value you don't yet know and work up towards the value you need to return. This means avoiding top-down recursion. It's easy to iteratively compute Fibonacci values. Here's a really compact version with a memo list:
def fib(n, memo=[0,1]):
while len(memo) < n+1:
memo.append(memo[-2]+memo[-1])
return memo[n]
Here's a quick demo run (which goes very fast):
>>> for i in range(90, 101):
print(fib(i))
2880067194370816120
4660046610375530309
7540113804746346429
12200160415121876738
19740274219868223167
31940434634990099905
51680708854858323072
83621143489848422977
135301852344706746049
218922995834555169026
354224848179261915075
>>> fib(4000)
39909473435004422792081248094960912600792570982820257852628876326523051818641373433549136769424132442293969306537520118273879628025443235370362250955435654171592897966790864814458223141914272590897468472180370639695334449662650312874735560926298246249404168309064214351044459077749425236777660809226095151852052781352975449482565838369809183771787439660825140502824343131911711296392457138867486593923544177893735428602238212249156564631452507658603400012003685322984838488962351492632577755354452904049241294565662519417235020049873873878602731379207893212335423484873469083054556329894167262818692599815209582517277965059068235543139459375028276851221435815957374273143824422909416395375178739268544368126894240979135322176080374780998010657710775625856041594078495411724236560242597759185543824798332467919613598667003025993715274875

Categories

Resources