I was watching Eric Meijer's lectures on functional programming and I found this example to be really nice and intriguing:
If I had to sum a list of numbers in an imperative way, I would do something like:
total=0
for each in range(0, 10):
total = total + each
where I am explaining how to do this, instead of just specifying what I want.
This expression in Python does the same thing:
sum(range(1,10))
and it is same as my original problem statement which is to "sum a list of numbers". This is a nice high-level programming language construct to have since is both readable and declarative.
range(1,10) captures the fact that this is a list of items.
sum captures the computation to be done.
So, my first thought was functions which return values are more useful than for-loops at-least in some scenarios. On further reading I also found for-loops are just a syntactic sugar for doing a jump operation which can also be replaced with a recursive function call with the proper base condition. Is that correct statement?
So generarlizing this, I just wrote a simple reduce function which looks like:
def reduce(operation, start, array):
# I think I could make this an expression too.
if len(array) == 1:
return operation(start, array[0])
return reduce(operation, operation(start, array[0]), array[1:])
I just wanted to know if this a good way to start thinking functionally i.e in terms of inputs and outputs as much as possible?
The advantage I can think of is:
We can create any number of partials like sum, product etc. But I think it can be implemented using loops as well.
The disadvantage is:
I am duplicating this array again and again. Space complexity is O(n^2). I can use indexes to avoid that problem, but the code will look messy.
Since there is no tail recursion in Python, it might create a huge stack. But that is an implementation detail to be aware of.
Re: Disadvantage #2
Yes it does create a huge stack. I have experienced this specifically with IronPython on Windows. If you get an error thrown deep into your number of recursions (unlikely for a simple sum, but when dealing with external APIs, it can happen) you will get a stacktrace back with an error thrown in every frame since the original call. This can make it very difficult to debug.
This code:
class ConnectedItem():
def __init__(self, name):
self.name = name
self.connected_to = None
def connect(self, item):
self.connected_to = item
def __repr__(self):
return "<item: {} connected to {}>".format(self.name, self.connected_to)
items = []
for l in ["a", "b", "c", "d"]:
items += [ConnectedItem(l)]
for n, i in enumerate(items):
if n < 3:
i.connect(items[n + 1])
def recursively_access(item):
# print(item.name)
return recursively_access(item.connected_to)
recursively_access(items[0])
Produces this:
This may not be exactly python, but recursion is a methodology that can be used in any language. Here is an example that can hopefully get you thinking along the right track.
recursiveMethod(param1, param2)
if (param1 > 10)
return from method
recursiveMethod(param1++, param2 += whateverOperation)
Related
First and foremost, I have searched many sites and have taken so much time finding a way to implement this specific requirement. To name a few, this and this from SO and many others from external sites.
The requirement is fairly simple to understand.
I cannot use import and can only use recursive to achieve this task. This function alone must be able to solve the problem by itself. No helper functions allowed.
I have to write a function with this definition:
def permutation(n: int) -> set[tuple[int]]:
The expected result when calling permutation(3) is as follows:
{(1,2,3),(1,3,2),(2,1,3),(2,3,1),(3,1,2),(3,2,1)}
I feel so frustrated that I cannot come up with any useful attempt to provide here. I tried to think of a solution for this but couldn't come up with anything useful. Therefore, I have no example code to start with.
The idea is that if you can get the list of every permutation for n - 1, you can insert n in between every point within those permuted results.
def permutation(n):
if n == 0:
# base case
return {()}
result = set()
for x in permutation(n - 1):
for i in range(n):
# insert n in position i in the permutation of all of the lesser numbers
result.add(x[:i] + (n,) + x[i:])
return result
The Problem:
Count the number of elements in a List using recursion.
I wrote the following function:
def count_rec(arr, i):
"""
This function takes List (arr) and Index Number
then returns the count of number of elements in it
using Recursion.
"""
try:
temp = arr[i] # if element exists at i, continue
return 1 + count_rec(arr, i+1)
except IndexError:
# if index error, that means, i == length of list
return 0
I noticed some problems with it:
RecursionError (when the number of elements is more than 990)
Using a temp element (wasting memory..?)
Exception Handling (I feel like we shouldn't use it unless necessary)
If anyone can suggest how to improve the above solution or come up with an alternative one, It would be really helpful.
What you have is probably as efficient as you are going to get for this thought experiment (obviously, python already calculates and stores length for LIST objects, which can be retrieved with the len() built-in, so this function is completely unnecessary).
You could get shorter code if you want:
def count(L):
return count(L[:-1])+1 if L else 0
But you still need to change python's recursion limit.
import sys; sys.setrecursionlimit(100000)
However, we should note that in most cases, "if else" statements take longer to process than "try except". Hence, "try except" is going to be a better (if you are after performance). Of course, that's weird talking about performance because recursion typically doesn't perform very well, due to how python manage's namespaces and such. Recursion is typically frowned upon, unnecessary, and slow. So, trying to optimize recursion performance is a littler strange.
A last point to note. You mention the temp=arr[i] taking up memory. Yes, possibly a few bytes. Of course, any calculation you do to determine if arr has an element at i, is going to take a few bytes in memory even simply running "arr[i]" without assignment. In addition, those bytes are freed the second the temp variable falls out of scope, gets re-used, or the function exits. Hence, unless you are planning on launching 10,000,000,000 sub-processes, rest assure there is no performance degradation in using a temp variable like that.
you are prob looking for something like this
def count_rec(arr):
if arr == []:
return 0
return count_rec(arr[1:]) + 1
You can use pop() to do it.
def count_r(l):
if l==[]:
return 0
else:
l.pop()
return count_r(l)+1
from pythonds.basic.stack import Stack
rStack = Stack()
def toStr(n,base):
convertString = "0123456789ABCDEF"
while n > 0:
if n < base:
rStack.push(convertString[n])
else:
rStack.push(convertString[n % base])
n = n // base
res = ""
while not rStack.isEmpty():
res = res + str(rStack.pop())
return res
print(toStr(1345,2))
I'm referring to this tutorial and also pasted the code above. The tutorial says the function is recursive but I don't see a recursive call anywhere, just a while loop. What am I missing?
You are right that this particular function is not recursive. However, the context is, that on the previous slide there was a recursive function, and in this one they want to show a glimpse of how it behaves internally. They later say:
The previous example [i.e. the one in question - B.] gives us some insight into how Python implements a recursive function call.
So, yes, the title is misleading, it should be rather Expanding a recursive function or Imitating recursive function behavior with a stack or something like this.
One may say that this function employs a recursive approach/strategy in some sense, to the problem being solved, but is not recursive itself.
A recursive algorithm, by definition, is a method where the solution to a problem depends on solutions to smaller instances of the same problem.
Here, the problem is to convert a number to a string in a given notation.
The "stockpiling" of data the function does actually looks like this:
push(d1)
push(d2)
...
push(dn-1)
push(dn)
res+=pop(dn)
res+=pop(dn-1)
...
res+=pop(d2)
res+=pop(d1)
which is effectively:
def pushpop():
push(dx)
pushpop(dx+1...dn)
res+=pop(dx)
I.e. a step that processes a specific chunk of data encloses all the steps that process the rest of the data (with each chunk processed in the same way).
It can be argued if the function is recursive (since they tend to apply the term to subroutines in a narrower sense), but the algorithm it implements definitely is.
For you to better feel the difference, here's an iterative solution to the same problem:
def toStr(n,base):
charmap = "0123456789ABCDEF"
res=''
while n > 0:
res = charmap[n % base] + res
n = n // base
return res
As you can see, this method has much lower memory footprint as it doesn't stockpile tasks. This is the difference: an iterative algorithm performs each step using the same instance of the state by mutating it while a recursive one creates a new instance for each step, necessarily stockpiling them if the old ones are still needed.
Because you're using a stack structure.
If you consider how function calling is implemented, recursion is essentially an easy way to get the compiler to manage a stack of invocations for you.
This function does all the stack handling manually, but it is still conceptually a recursive function, just one where the stack management is done manually instead of letting the compiler do it.
I am trying to calculate the length of a list. When I run it on cmd, I get:
RuntimeError: maximum recursion depth exceeded in comparison
I don't think there's anything wrong with my code:
def len_recursive(list):
if list == []:
return 0
else:
return 1 + len_recursive(list[1:])
Don't use recursion unless you can predict that it is not too deep. Python has quite small limit on recursion depth.
If you insist on recursion, the efficient way is:
def len_recursive(lst):
if not lst:
return 0
return 1 + len_recursive(lst[1::2]) + len_recursive(lst[2::2])
The recursion depth in Python is limited, but can be increased as shown in this post. If Python had support for the tail call optimization, this solution would work for arbitrary-length lists:
def len_recursive(lst):
def loop(lst, acc):
if not lst:
return acc
return loop(lst[1:], acc + 1)
return loop(lst, 0)
But as it is, you will have to use shorter lists and/or increase the maximum recursion depth allowed.
Of course, no one would use this implementation in real life (instead using the len() built-in function), I'm guessing this is an academic example of recursion, but even so the best approach here would be to use iteration, as shown in #poke's answer.
As others have explained, there are two problems with your function:
It's not tail-recursive, so it can only handle lists as long as sys.getrecursionlimit.
Even if it were tail-recursive, Python doesn't do tail recursion optimization.
The first is easy to solve. For example, see Óscar López's answer.
The second is hard to solve, but not impossible. One approach is to use coroutines (built on generators) instead of subroutines. Another is to not actually call the function recursively, but instead return a function with the recursive result, and use a driver that applies the results. See Tail Recursion in Python by Paul Butler for an example of how to implement the latter, but here's what it would look like in your case.
Start with Paul Butler's tail_rec function:
def tail_rec(fun):
def tail(fun):
a = fun
while callable(a):
a = a()
return a
return (lambda x: tail(fun(x)))
This doesn't work as a decorator for his case, because he has two mutually-recursive functions. But in your case, that's not an issue. So, using Óscar López's's version:
#tail_rec
def tail_len(lst):
def loop(lst, acc):
if not lst:
return acc
return lambda: loop(lst[1:], acc + 1)
return lambda: loop(lst, 0)
And now:
>>> print tail_len(range(10000))
10000
Tada.
If you actually wanted to use this, you might want to make tail_rec into a nicer decorator:
def tail_rec(fun):
def tail(fun):
a = fun
while callable(a):
a = a()
return a
return functools.update_wrapper(lambda x: tail(fun(x)), fun)
Imagine you're running this using a stack of paper. You want to count how many sheets you have. If someone gives you 10 sheets you take the first sheet, put it down on the table and grab the next sheet, placing it next to the first sheet. You do this 10 times and your desk is pretty full, but you've set out each sheet. You then start to count every page, recycling it as you count it up, 0 + 1 + 1 + ... => 10. This isn't the best way to count pages, but it mirrors the recursive approach and python's implementaion.
This works for small numbers of pages. Now imagine someone gives you 10000 sheets. Pretty soon there is no room on your desk to set out each page. This is essentially what the error message is telling you.
The Maximum Recursion depth is "how many sheets" can the table hold. Each time you call python needs to keep the "1 + result of recursive call" around so that when all the pages have been laid out it can come back and count them up. Unfortunately you're running out of space before the final counting-up occurs.
If you want to do this recursively to learn, since you're want to use len() in any reasonable situation, just use small lists, 25 should be fine.
Some systems could handle this for large lists if they support tail calls
Your exception message means that your method is called recursively too often, so it’s likely that your list is just too long to count the elements recursively like that. You could do it simply using a iterative solution though:
def len_iterative(lst):
length = 0
while lst:
length += 1
lst = lst[1:]
return length
Note that this will very likely still be a terrible solution as lst[1:] will keep creating copies of the list. So you will end up with len(lst) + 1 list instances (with lengths 0 to len(lst)). It is probably the best idea to just use the built-in len directly, but I guess it was an assignment.
Python isn't optimising tail recursion calls, so using such recursive algorythms isn't a good idea.
You can tweak stack with sys.setrecursionlimit(), but it's still not a good idea.
There's this script called svnmerge.py that I'm trying to tweak and optimize a bit. I'm completely new to Python though, so it's not easy.
The current problem seems to be related to a class called RevisionSet in the script. In essence what it does is create a large hashtable(?) of integer-keyed boolean values. In the worst case - one for each revision in our SVN repository, which is near 75,000 now.
After that it performs set operations on such huge arrays - addition, subtraction, intersection, and so forth. The implementation is the simplest O(n) implementation, which, naturally, gets pretty slow on such large sets. The whole data structure could be optimized because there are long spans of continuous values. For example, all keys from 1 to 74,000 might contain true. Also the script is written for Python 2.2, which is a pretty old version and we're using 2.6 anyway, so there could be something to gain there too.
I could try to cobble this together myself, but it would be difficult and take a lot of time - not to mention that it might be already implemented somewhere. Although I'd like the learning experience, the result is more important right now. What would you suggest I do?
You could try doing it with numpy instead of plain python. I found it to be very fast for operations like these.
For example:
# Create 1000000 numbers between 0 and 1000, takes 21ms
x = numpy.random.randint(0, 1000, 1000000)
# Get all items that are larger than 500, takes 2.58ms
y = x > 500
# Add 10 to those items, takes 26.1ms
x[y] += 10
Since that's with a lot more rows, I think that 75000 should not be a problem either :)
Here's a quick replacement for RevisionSet that makes it into a set. It should be much faster. I didn't fully test it, but it worked with all of the tests that I did. There are undoubtedly other ways to speed things up, but I think that this will really help because it actually harnesses the fast implementation of sets rather than doing loops in Python which the original code was doing in functions like __sub__ and __and__. The only problem with it is that the iterator isn't sorted. You might have to change a little bit of the code to account for this. I'm sure there are other ways to improve this, but hopefully it will give you a good start.
class RevisionSet(set):
"""
A set of revisions, held in dictionary form for easy manipulation. If we
were to rewrite this script for Python 2.3+, we would subclass this from
set (or UserSet). As this class does not include branch
information, it's assumed that one instance will be used per
branch.
"""
def __init__(self, parm):
"""Constructs a RevisionSet from a string in property form, or from
a dictionary whose keys are the revisions. Raises ValueError if the
input string is invalid."""
revision_range_split_re = re.compile('[-:]')
if isinstance(parm, set):
print "1"
self.update(parm.copy())
elif isinstance(parm, list):
self.update(set(parm))
else:
parm = parm.strip()
if parm:
for R in parm.split(","):
rev_or_revs = re.split(revision_range_split_re, R)
if len(rev_or_revs) == 1:
self.add(int(rev_or_revs[0]))
elif len(rev_or_revs) == 2:
self.update(set(range(int(rev_or_revs[0]),
int(rev_or_revs[1])+1)))
else:
raise ValueError, 'Ill formatted revision range: ' + R
def sorted(self):
return sorted(self)
def normalized(self):
"""Returns a normalized version of the revision set, which is an
ordered list of couples (start,end), with the minimum number of
intervals."""
revnums = sorted(self)
revnums.reverse()
ret = []
while revnums:
s = e = revnums.pop()
while revnums and revnums[-1] in (e, e+1):
e = revnums.pop()
ret.append((s, e))
return ret
def __str__(self):
"""Convert the revision set to a string, using its normalized form."""
L = []
for s,e in self.normalized():
if s == e:
L.append(str(s))
else:
L.append(str(s) + "-" + str(e))
return ",".join(L)
Addition:
By the way, I compared doing unions, intersections and subtractions of the original RevisionSet and my RevisionSet above, and the above code is from 3x to 7x faster for those operations when operating on two RevisionSets that have 75000 elements. I know that other people are saying that numpy is the way to go, but if you aren't very experienced with Python, as your comment indicates, then you might not want to go that route because it will involve a lot more changes. I'd recommend trying my code, seeing if it works and if it does, then see if it is fast enough for you. If it isn't, then I would try profiling to see what needs to be improved. Only then would I consider using numpy (which is a great package that I use quite frequently).
For example, all keys from 1 to 74,000 contain true
Why not work on a subset? Just 74001 to the end.
Pruning 74/75th of your data is far easier than trying to write an algorithm more clever than O(n).
You should rewrite RevisionSet to have a set of revisions. I think the internal representation for a revision should be an integer and revision ranges should be created as needed.
There is no compelling reason to use code that supports python 2.3 and earlier.
Just a thought. I used to do this kind of thing using run-coding in binary image manipulation. That is, store each set as a series of numbers: number of bits off, number of bits on, number of bits off, etc.
Then you can do all sorts of boolean operations on them as decorations on a simple merge algorithm.