Python: Maximum recursion depth error - python

I am having a problem with 'Maximum recursion depth exceeded' in python
I converted a java(I dont know java so it wasn't easy) function to python function and it did work for small lists but when I use large lists I get that error. I tried to do sys.setrecursionlimit(10000) but it seem that there is a problem because it will not finish, maybe because I converted the java code to python in a wrong way.
this is the python code of the function
def fun(a, b):
inf = 10000
c=[]
boolean = [[0 for x in xrange(len(b))] for x in xrange(len(a))]
dp = [[inf for x in xrange(len(b))] for x in xrange(len(a))]
def maxMatching(i, j):
if i == -1:
return 0
if j == -1:
return inf
if dp[i][j] != inf:
return dp[i][j]
val1 = maxMatching(i, j - 1)
val2 = abs(a[i] - b[j]) + maxMatching(i - 1, j - 1)
if cmp(val1, val2) > 0:
dp[i][j] = val2
boolean[i][j] = True
else:
dp[i][j] = val1
return dp[i][j]
def add_to_list(i, j):
if i == -1 or j == -1:
return
if boolean[i][j]:
c.append(b[j])
add_to_list(i - 1, j - 1)
else:
add_to_list(i, j - 1)
maxMatching(len(a) - 1, len(b) - 1)
add_to_list(len(a) - 1, len(b) - 1)
return sorted(c, reverse=True)
a=[20, 19, 13]
b=[21, 20, 14, 11, 5]
c=fun(a, b)
assert c == [21, 20, 14]
the function should return a list from list b which are the nearest points from list a
I thought that convert this function to iterative will resolve the problem.
my question is , how to make that function 100% iterative instead of recursive ?
thanks

To remove the recursion, you need to make your functions iterative.
For add to list, this is easy. Something like this should work.
def add_to_list(i, j):
while i != -1 and j == -1:
if boolean[i][j]:
c.append(b[j])
i = i - 1
j = j - 1
else:
j = j - 1
For maxMatching, this is also possible, but it takes some more work. However, do you notice that your recursion builds the dp table from top left to bottom right? And that you use the values of the dp to calculate the value maxMatching more to the right and to the bottom?
So what you can do is to create a helper table (like dp and boolean) and construct that from top to bottom and left to right. For each of the cells you calculate the value based on the values as you would now, but instead of using recursion, you use the value from the helper table.
This method is called Dynamic Programming, which is building a solution based on the solutions of smaller problems. Many problems that can be defined using some form of mathematical resursion can be solved using Dynamic Programming. See http://en.wikipedia.org/wiki/Dynamic_programming for more examples.

Related

Top-down approach poorly implemented

I have been kinda practicing some recursive / dynamic programming exercises, but I have been struggling trying to convert this recursive problem into a top-down one, as for example, for f(4,4), the recursive one yields 5, whereas my top-down yields 2.
I can see the base cases being the same, so I might be messing up with the logic somehow, but I haven't found where.
the code is as written:
def funct(n, m):
if n == 0:
return 1
if m == 0 or n < 0:
return 0
return funct(n-m, m) + funct(n, m-1)
and the top-down approach is:
def funct_top_down(n, m):
if n == 0:
return 1
if m == 0 or n < 0:
return 0
memo = [[-1 for x in range(m+1)] for y in range(n+1)]
return funct_top_down_imp(n, m, memo)
def funct_top_down_imp(n, m, memo):
if memo[n][m] == -1:
if n == 0:
memo[n][m] = 1
elif m == 0 or n < 0:
memo[n][m] = 0
else:
memo[n][m] = funct_top_down_imp(n-m, m, memo) + funct_top_down_imp(n, m-1, memo)
return memo[n][m]
Thanks!
The basic idea is to compute all possible values of func instead of just one. Sounds like a lot, but since func needs all previous values anyways, this would be actually the same amount of work.
We place computed values in a matrix N+1 x M+1 for given N, M so that it holds that
matrix[n][m] === func(n, m)
Observe that the first row of the matrix will be all 1, since f(0,m) returns 1 for every m. Then, starting from the second row, compute the values left to right using your recursive dependency:
matrix[n][m] = matrix[n-m][m] + matrix[n][m-1]
(don't forget to check the bounds!)
Upon completion, your answer will be in the bottom right corner of the matrix. Good luck!

Python implementation of the mergeSort algorithm

I came across the following implementation of the mergeSort algorithm:
def merge_sort(x):
merge_sort2(x,0,len(x)-1)
def merge_sort2(x,first,last):
if first < last:
middle = (first + last) // 2
merge_sort2(x,first,middle)
merge_sort2(x,middle+1,last)
merge(x,first,middle,last)
def merge(x,first,middle,last):
L = x[first:middle+1]
R = x[middle+1:last+1]
L.append(999999999)
R.append(999999999)
i=j=0
for k in range(first,last+1):
if L[i] <= R[j]:
x[k] = L[i]
i += 1
else:
x[k] = R[j]
j += 1
x = [17, 87, 6, 22, 41, 3, 13, 54]
x_sorted = merge_sort(x)
print(x)
I get most of it. However, what I don't understand are the following four lines of the merge function:
L = x[first:middle+1]
R = x[middle+1:last+1]
L.append(999999999)
R.append(999999999)
First of all: why does the slicing end with middle+1 ? Slicing an array in Python includes the last element, right? So, shouldn't it be sufficient to slice from first:middle ? So, what is the +1 there for?
Secondly: Why do I have to append the huge number to the lists? Why doesn't it work without? It doesn't, I checked that. But I just don't know why.
Q1: Slicing an array in Python includes the last element, right?
No, Like range function Python slicing doesn't include the last element.
> a=[1,2,3,4,5]
> a[1:4]
[2, 3, 4]
Q2: Regarding the below snippet.
L = x[first:middle+1]
R = x[middle+1:last+1]
L.append(999999999)
R.append(999999999)
Without appending those large numbers to the lists, your merge code could have been different something like below.
# Copy data to temp arrays L[] and R[]
while i < len(L) and j < len(R):
if L[i] <= R[j]:
x[k] = L[i]
i += 1
else:
x[k] = R[j]
j += 1
# Checking if any element was left
while i < len(L):
x[k] = L[i]
i+=1
k+=1
while j < len(R):
x[k] = R[j]
j+=1
k+=1
As #Cedced_Bro pointed out in the comment section, those largest numbers are used to know that the end of one of the sides has been reached.
If you observe the above code snippet, if we run out of numbers in one list we ideally get out of the for loop and inserts the remaining elements of other lists in the temp array if any.
Appending those large numbers is an intelligent way to avoid those two for loops. But it has some cost of unnecessary comparison of 999999999 with remaining elements in the other list.
You don't really need the spaghetti-style nested function, simply recur would do, from https://rosettacode.org/wiki/Sorting_algorithms/Merge_sort#Python
from heapq import merge
def merge_sort(m):
if len(m) <= 1:
return m
middle = len(m) // 2
left = m[:middle]
right = m[middle:]
left = merge_sort(left)
right = merge_sort(right)
return list(merge(left, right))
The indexing shouldn't have +1 since Python slices don't overlap if they are the same index, i.e.
>>> x = [1,2,3,4,5,6]
>>> middle = 4
>>> x[:middle]
[1, 2, 3, 4]
>>> x[middle:]
[5, 6]
Moreover the heapq implementation of merge would have been more optimal than what you can write =)

Solving the "firstDuplicate" question in Python

I'm trying to solve the following challenge from codesignal.com:
Given an array a that contains only numbers in the range from 1 to a.length, find the first duplicate number for which the second occurrence has the minimal index. In other words, if there are more than 1 duplicated numbers, return the number for which the second occurrence has a smaller index than the second occurrence of the other number does. If there are no such elements, return -1.
Example
For a = [2, 1, 3, 5, 3, 2], the output should be
firstDuplicate(a) = 3.
There are 2 duplicates: numbers 2 and 3. The second occurrence of 3 has a smaller index than the second occurrence of 2 does, so the answer is 3.
For a = [2, 4, 3, 5, 1], the output should be
firstDuplicate(a) = -1.
The execution time limit is 4 seconds.
The guaranteed constraints were:
1 ≤ a.length ≤ 10^5, and
1 ≤ a[i] ≤ a.length
So my code was:
def firstDuplicate(a):
b = a
if len(list(set(a))) == len(a):
return -1
n = 0
answer = -1
starting_distance = float("inf")
while n!=len(a):
value = a[n]
if a.count(value) > 1:
place_of_first_number = a.index(value)
a[place_of_first_number] = 'string'
place_of_second_number = a.index(value)
if place_of_second_number < starting_distance:
starting_distance = place_of_second_number
answer = value
a=b
n+=1
if n == len(a)-1:
return answer
return answer
Out of the 22 tests the site had, I passed all of them up to #21, because the test list was large and the execution time exceeded 4 seconds. What are some tips for reducing the execution time, while keeping the the code more or less the same?
As #erip has pointed out in the comments, you can iterate through the list, add items to a set, and if the item is already in a set, it is a duplicate that has the lowest index, so you can simply return the item; or return -1 if you get to the end of the loop without finding a duplicate:
def firstDuplicate(a):
seen = set()
for i in a:
if i in seen:
return i
seen.add(i)
return -1
Create a new set and find its already in the new list, if its there return the element:
def firstDuplicate(a):
dup = set()
for i in range(len(a)):
if a[i] in dup:
return a[i]
else:
dup.add(a[i])
return -1
This is just an idea, I didn't verify it but it should work. It seems there's no memory limit but just a time limit. Therefore using space to trade time is probably a practical way to do this. The computation complexity is O(n). This algorithm also depends on the condition that the number range is between 1 to len(a).
def first_duplicate(a):
len_a = len(a)
b = [len_a + 1] * len_a
for i, n in enumerate(a):
n0 = n - 1
if b[n0] == len_a + 1:
b[n0] = len_a
elif b[n0] == len_a:
b[n0] = i
min_i = len_a
min_n = -1
for n0, i in enumerate(b):
if i < min_i:
min_i = i
min_n = n0 + 1
return min_n
Update:
This solution is not as fast as the set() solution by #blhsing. However, it may not be the same if it was implemented in C - it's kinda unfair since set() is a built-in function which was implemented in C as other core functions of CPython.

Index out of range in implementation of a variation of mergesort algorithm in python?

I have done a variation of my merge sort algorithm in python, based on what I've learnt from the CLRS book, and compared it with the implementation done on the introductory computer science book by MIT. I cannot find the problem in my algorithm, and the IDLE gives me an index out of range although everything looks fine to me. I'm unsure if this is due to some confusion in borrowing ideas from the MIT algorithm (see below).
lista = [1,2,3,1,1,1,1,6,7,12,2,7,7,67,4,7,9,6,6,3,1,14,4]
def merge(A, p, q, r):
q = (p+r)/2
L = A[p:q+1]
R = A[q+1:r]
i = 0
j = 0
for k in range(len(A)):
#if the list R runs of of space and L[i] has nothing to compare
if i+1 > len(R):
A[k] = L[i]
i += 1
elif j+1 > len(L):
A[k] = R[j]
j += 1
elif L[i] <= R[j]:
A[k] = L[i]
i += 1
elif R[j] <= L[i]:
A[k] = R[j]
j += 1
#when both the sub arrays have run out and all the ifs and elifs done,
# the for loop has effectively ended
return A
def mergesort(A, p, r):
"""A is the list, p is the first index and r is the last index for which
the portion of the list is to be sorted."""
q = (p+r)/2
if p<r:
mergesort(A, p, q)
mergesort(A, q+1, r)
merge (A, p, q, r)
return A
print mergesort(lista, 0, len(lista)-1)
I have followed the pseudocode in CLRS as closely as I could, just without using the "infinity value" at the end of L and R, which would continue to compare (is this less efficient?). I tried to incorporate ideas like that in the MIT book, which is to simply copy down the remaining L or R list to A, to mutate A and return a sorted list. However, I can't seem to find what has went wrong with it. Also, I don't get why the pseudo code requires a 'q' as an input, given that q would be calculated as (p+q)/2 for the middle index anyway. And why is there a need to put p
On the other hand, from the MIT book, we have something that looks really elegant.
def merge(left, right, compare):
"""Assumes left and right are sorted lists and
compare defines an ordering on the elements.
Returns a new sorted(by compare) list containing the
same elements as(left + right) would contain.
"""
result = []
i, j = 0, 0
while i < len(left) and j < len(right):
if compare(left[i], right[j]):
result.append(left[i])
i += 1
else :
result.append(right[j])
j += 1
while (i < len(left)):
result.append(left[i])
i += 1
while (j < len(right)):
result.append(right[j])
j += 1
return result
import operator
def mergeSort(L, compare = operator.lt):
"""Assumes L is a list, compare defines an ordering
on elements of L.
Returns a new sorted list containing the same elements as L"""
if len(L) < 2:
return L[: ]
else :
middle = len(L) //2
left = mergeSort(L[: middle], compare)
right = mergeSort(L[middle: ], compare)
return merge(left, right, compare)
Where could I have gone wrong?
Also, I think the key difference in the MIT implementation is that it creates a new list instead of mutating the original list. This makes it quite difficult for me to understand mergesort, because I found the CLRS explanation quite clear, by understanding it in terms of different layers of recursion occurring to sort the most minute components of the original list (the list of length 1 that needs no sorting), thus "storing" the results of recursion within the old list itself.
However, thinking again, is it right to say that the "result" returned by each recursion in the MIT algorithm, which is in turn combined?
Thank you!
the fundamental difference between your code and the MIT is the conditional statement in the mergesort function. Where your if statement is:
if p<r:
theirs is:
if len(L) < 2:
This means that if you were to have, at any point in the recursive call tree, a list that is of len(A) == 1, then it would still call merge on a size 1 or even 0 list. You can see that this causes problems in the merge function because then your L, R, or both sub lists can end up being of size 0, which would then cause an out if bounds index error.
your problem could then be easily fixed by changing your if statement to something alike to theirs, like len(A) < 2 or r-p < 2

Pythonic way to implement three similar integer range operators?

I am working on a circular problem. In this problem, we have objects that are put on a ring of size MAX, and are assigned IDs from (0 to MAX-1).
I have three simple functions to test for range inclusions. inRange(i,j,k) tests if i is in the circular interval [j,k[ (Mnemonic is i inRange(j,k)). And I have the same for ranges ]j,k[ and ]j,k].
Code in those three methods look duplicated from one method to another:
def inRange(i,j,k):
"""
Returns True if i in [j, k[
* 0 <= i, j, k < MAX
* no order is assumed between j and k: we can have k < j
"""
if j <= k:
return j <= i < k
# j > k :
return j <= i or i < k
def inStrictRange(i,j,k):
"""
Returns True if i in ]j, k[
* 0 <= i, j, k < MAX
* no order is assumed between j and k: we can have k < j
"""
if j <= k:
return j < i < k
# j > k :
return j < i or i < k
def inRange2(i,j,k):
"""
Returns True if i in ]j, k]
* 0 <= i, j, k < MAX
* no order is assumed between j and k: we can have k < j
"""
if j <= k:
return j < i <= k
# j > k :
return j < i or i <= k
Do you know any cleaner way to implement those three methods? After all, only the operators are changing?!
After thinking of a better solution, I came up with:
from operator import lt, le
def _compare(i,j,k, op1, op2):
if j <= k:
return op1(j,i) and op2(i,k)
return op1(j,i) or op2(i,k)
def inRange(i,j,k):
return _compare(i,j,k, le, lt)
def inStrictRange(i,j,k):
return _compare(i,j,k, lt, lt)
def inRange2(i,j,k):
return _compare(i,j,k, lt, le)
Is it any better? Can you come up with something more intuitive?
In short, what would be the Pythonic way to write these three operators?
Also, I hate the inRange, inStrictRange, inRange2 names, but I can't think of crystal-clear names. Any ideas?
Thanks.
Two Zen of Python principles leap to mind:
Simple is better than complex.
There should be one—and preferably only one—obvious way to do it.
range
The Python built-in function range(start, end) generates a list from start to end.1 The first element of that list is start, and the last element is end - 1.
There is no range_strict function or inclusive_range function. This was very awkward to me when I started in Python. ("I just want a list from a to b inclusive! How hard is that, Guido?") However, the convention used in calling the range function was simple and easy to remember, and the lack of multiple functions made it easy to remember exactly how to generate a range every time.
Recommendation
As you've probably guessed, my recommendation is to only create a function to test whether i is in the range [j, k). In fact, my recommendation is to keep only your existing inRange function.
(Since your question specifically mentions Pythonicity, I would recommend you name the function as in_range to better fit with the Python Style Guide.)
Justification
Why is this a good idea?
The single function is easy to understand. It is very easy to learn how to use it.
Of course, the same could be said for each of your three starting functions. So far so good.
There is only one function to learn. There are not three functions with necessarily similar names.
Given the similar names and behaviours of your three functions, it is somewhat possible that you will, at some point, use the wrong function. This is compounded by the fact that the functions return the same value except for edge cases, which could lead to a hard-to-find off-by-one bug. By only making one function available, you know you will not make such a mistake.
The function is easy to edit.
It is unlikely that you'll need to ever debug or edit such an easy piece of code. However, should you need to do so, you need only edit this one function. With your original three functions, you have to make the same edit in three places. With your revised code in your self-answer, the code is made slightly less intuitive by the operator obfuscation.
The "size" of the range is obvious.
For a given ring where you would use inRange(i, j, k), it is obvious how many elements would be covered by the range [j, k). Here it is in code.
if j <= k:
size = k - j
if j > k:
size = k - j + MAX
So therefore
size = (k - j) % MAX
Caveats
I'm approaching this problem from a completely generic point of view, such as that of a person writing a function for a publicly-released library. Since I don't know your problem domain, I can't say whether this is a practical solution.
Using this solution may mean a fair bit of refactoring of the code that calls these functions. Look through this code to see if editing it is prohibitively difficult or tedious.
1: Actually, it is range([start], end, [step]). I trust you get what I mean though.
The Pythonic way to do it is to choose readability, and therefor keep the 3 methods as they were at the beginning.
It's not like they are HUGE methods, or there are thousand of them, or you would have to dynamically generate them.
No higher-order functions, but it's less code, even with the extraneous else.
def exclusive(i, j, k):
if j <= k:
return j < i < k
else:
return j < i or i < k
def inclusive_left(i, j, k):
return i==j or exclusive(i, j, k)
def inclusive_right(i, j, k):
return i==k or exclusive(i, j, k)
I actually tried switching the identifiers to n, a, b, but the code began to look less cohesive. (My point: perfecting this code may not be a productive use of time.)
Now I am thinking of something such as:
def comparator(lop, rop):
def comp(i, j, k):
if j <= k:
return lop(j, i) and rop(i,k)
return lop(j, i) or rop(i,k)
return comp
from operator import le, lt
inRange = comparator(le, lt)
inStrictRange = comparator(lt, lt)
inRange2 = comparator(lt, le)
Which looks better indeed.
I certainly agree that you need only one function, and that the function should use a (Pythonic) half-open range.
Two suggestions:
Use meaningful names for the args:
in_range(x, lo, hi) is a big
improvement relative to the
2-keystroke cost.
Document the fact that the
constraint hi < MAX means that it is
not possible to express a range that
includes all MAX elements. As
Wesley remarked, size = (k - j) %
MAX i.e. size = (hi - lo) % MAX
and thus 0 <= size < MAX.
To make it more familiar to your users, I would have one main in_range function with the same bounds as range(). This makes it much easier to remember, and has other nice properties as Wesley mentioned.
def in_range(i, j, k):
return (j <= i < k) if j <= k else (j <= i or i < k)
You can certainly use this one alone for all your use cases by adding 1 to j and/or k. If you find that you're using a specific form frequently, then you can define it in terms of the main one:
def exclusive(i, j, k):
"""Excludes both endpoints."""
return in_range(i, j + 1, k)
def inclusive(i, j, k):
"""Includes both endpoints."""
return in_range(i, j, k + 1)
def weird(i, j, k):
"""Excludes the left endpoint but includes the right endpoint."""
return in_range(i, j + 1, k + 1)
This is shorter than mucking around with operators, and is also much less confusing to understand. Also, note that you should use underscores instead of camelCase for function names in Python.
I'd go one step further than Wesley in aping the normal python 'in range' idiom; i'd write a cyclic_range class:
import itertools
MAX = 10 # or whatever
class cyclic_range(object):
def __init__(self, start, stop):
# mod so you can be a bit sloppy with indices, plus -1 means the last element, as with list indices
self.start = start % MAX
self.stop = stop % MAX
def __len__(self):
return (self.stop - self.start) % MAX
def __getitem__(self, i):
return (self.start + i) % MAX
def __contains__(self, x):
if (self.start < self.stop):
return (x >= self.start) and (x < self.stop)
else:
return (x >= self.start) or (x < self.stop)
def __iter__(self):
for i in xrange(len(self)):
yield self[i]
def __eq__(self, other):
if (len(self) != len(other)): return False
for a, b in itertools.izip(self, other):
if (a != b): return False
return True
def __hash__(self):
return (self.start << 1) + self.stop
def __str__(self):
return str(list(self))
def __repr__(self):
return "cyclic_range(" + str(self.start) + ", " + str(self.stop) + ")"
# and whatever other list-like methods you fancy
You can then write code like:
if (myIndex in cyclic_range(firstNode, stopNode)):
blah
To do the equivalent of inRange. To do inStrictRange, write:
if (myIndex in cyclic_range(firstNode + 1, stopNode)):
And to do inRange2:
if (myIndex in cyclic_range(firstNode + 1, stopNode + 1)):
If you don't like doing the additions by hand, how about adding these methods:
def strict(self):
return cyclic_range(self.start + 1, self.stop)
def right_closed(self):
return cyclic_range(self.start + 1, self.stop + 1)
And then doing:
if (myIndex in cyclic_range(firstNode, stopNode).strict()): # inStrictRange
if (myIndex in cyclic_range(firstNode, stopNode).closed_right()): # inRange2
Whilst this approach is, IMHO, more readable, it does involve doing an allocation, rather than just a function call, which is more expensive - although still O(1). But then if you really cared about performance, you wouldn't be using python!

Categories

Resources