What is your preferred method of traversing a tree data structure, since recursive method calls can be pretty inefficient in some circumstances. I am simply using a generator like the one above. Do you have any hints to make it faster?
def children(self):
stack = [self.entities]
while stack:
for e in stack.pop():
yield e
if e.entities:
stack.append(e.entities)
Here is some test data.
The first one is recursive, the second uses the generator:
s = time.time()
for i in range(100000):
e.inc_counter()
print time.time() - s
s = time.time()
for i in range(100000):
for e in e.children():
e.inc_counter_s()
print time.time() - s
Results:
0.416000127792
0.298999786377
Test code:
import random
class Entity():
def __init__(self, name):
self.entities = []
self.name = name
self.counter = 1
self.depth = 0
def add_entity(self, e):
e.depth = self.depth + 1
self.entities.append(e)
def inc_counter_r(self):
for e in self.entities:
e.counter += 1
e.inc_counter_r()
def children(self):
stack = [self.entities]
while stack:
for e in stack.pop():
yield e
if e.entities:
stack.append(e.entities)
root = Entity("main")
def fill_node(root, max_depth):
if root.depth <= max_depth:
for i in range(random.randint(10, 15)):
e = Entity("node_%s_%s" % (root.depth, i))
root.add_entity(e)
fill_node(e, max_depth)
fill_node(root, 3)
import time
s = time.time()
for i in range(100):
root.inc_counter_r()
print "recursive:", time.time() - s
s = time.time()
for i in range(100):
for e in root.children():
e.counter += 1
print "generator:", time.time() - s
I can't think of any big algorithmic improvements, but a simple microoptimisation you can make is to bind frequently called methods (such as stack.append / stack.pop) to locals (this saves a dictionary lookup)
def children(self):
stack = [self.entities]
push = stack.append
pop = stack.pop
while stack:
for e in pop():
yield e
if e.entities:
push(e.entities)
This gives a small (~15%) speedup by my tests (using 100 traversals of an 8-deep tree with 4 children at each node gives me the below timings:)
children : 5.53942348004
children_bind: 4.77636131253
Not huge, but worth doing if speed is important.
Unless your tree is really large or you have really high (real) requirements for speed, I would choose the recursive method. Easier to read, easier to code.
Recursive function calls are not incredibly inefficient, that is an old programming myth. (If they're badly implemented, they may incur a larger overhead than necessary, but calling them "incredibly inefficient" is plain wrong.)
Remember: don't optimize prematurely, and never optimize without benchmarking first.
I'm not sure if you can reduce the overhead much on a full in-order traversal of a tree, if you use recursion the call stack will grow some, otherwise you must manually use a stack to push references of the children while visiting each node. Which way is fastest and uses less memory, depends on the expensiveness of the call stack vs. a normal stack. (I would guess the callstack is faster since it should be optimized for its use, and recursion is much easier to implement)
If you don't care about the order you visit the nodes, some implementations of trees is actually stored in a dynamic array or linked list or stack wich you can traverse linearly if you don't care about the order it's traversed.
But why is it important to have a fast traversal anyway? Trees are good for searching, arrays/linked lists is good for full traversal. If you often need full in-order traversal but few searches and insertions/deletions, an ordered linked list might be best, if searching is what you do most you use a tree. If the data is really massive, so that memory overhead may render recursion impossible, you should use a database.
If you have a lot of RAM and the tree doesn't change often, you can cache the result of the call:
def children(self):
if self._children_cache is not None:
return self._children_cache
# Put your code into collectChildren()
self._children_cache = self.collectChildren()
return self._children_cache
Whenever the tree changes, set the cache to None. In this case, using recursive calls might be more effective since the results will accumulate faster.
I've written iterative tree-traversal code in the past: it's very ugly, and not fast, unless you know exactly how many children not only each subtree will have, but how many levels there are.
I don't know too much about Python internals of function calls, but I really can't imagine that your code snippet is faster than recursively traversing the tree.
The call stack (used for function calls, including recursive ones) is typically very fast. Going to the next object will only cost you a single function call. But in your snippet - where you use a stack object, going to the next object will cost you a stack.append (possibly allocating memory on heap), a stack.push (possibly freeing memory from heap), and a yield.
The main problem with recursive calls is that you might blow the stack if your tree gets too deep. This isn't likely to happen.
Here's a pair of small corrections.
def children(self):
stack = [self.entities]
for e in stack:
yield e
if e.entities:
stack.extend(e.entities)
I actually think the generator, using append, isn't visiting all the nodes. I think you mean to extend the stack with all entities, not append a simple list of entities to the stack.
Also, when the for loop terminates, the while loop in your original example will also terminate because there's no change to the empty stack after the for loop.
Related
I am iterating the search space of valid Python3 ASTs. With max recursion depth = 3, my laptop runs out of memory. My implementation makes heavy use of generators, specifically 'yield' and itertools.product().
Ideally, I'd replace product() and the max recursion depth with some sort of iterative deepening, but first things first:
Are there any libraries or useful SO posts for out-of-core/external-memory combinatorics?
If not... I am considering the feasibility of using either dask or joblib's delayed()... or perhaps wendelin-core's ZBigArray, though I don't like the looks of its interface:
root = dbopen('test.fs')
root['A'] = A = ZBigArray((10,), np.int)
transaction.commit()
Based on this example, I think that my solution would involve an annotation/wrapper function that eagerly converts the generators to ZBigArrays, replacing root['A'] with something like root[get_key(function_name, *function_args)] It's not pretty, since my generators are not entirely pure--the output is shuffled. In my current version, this shouldn't be a big deal, but the previous and next versions involve using various NNs and RL rather mere shuffling.
First things first- the reason you're getting the out of memory error is because itertools.product() caches intermediate values. It has no idea if the function that gave you your generator is idempotent, and even if it did, it couldn't be able to infer how to call it again given just the generator. This means itertools.product must cache values of each iterable its passed.
The solution here is to bite the small performance bullet and either write explicit for loops, or write your own cartesian product function, which takes functions that would produce each generator. For instance:
def product(*funcs, repeat=None):
if not funcs:
yield ()
return
if repeat is not None:
funcs *= repeat
func, *rest = funcs
for val in func():
for res in product(*rest):
yield (val, ) + res
from functools import partial
values = product(partial(gen1, arg1, arg2), partial(gen2, arg1))
The bonus from rolling your own here is that you can also change how it goes about traversing the A by B by C ... dimensional search space, so that you could do maybe a breadth-first search instead of an iteratively deepening DFS. Or, maybe pick some random space-filling curve, such as the Hilbert Curve which would iterate all indices/depths of each dimension in your product() in a local-centric fashion.
Apart from that, I have one more thing to point out- you can also implement BFS lazily (using generators) to avoid building a queue that could bloat memory usage as well. See this SO answer, copied below for convenience:
def breadth_first(self):
yield self
for c in self.breadth_first():
if not c.children:
return # stop the recursion as soon as we hit a leaf
yield from c.children
Overall, you will take a performance hit from using semi-coroutines, with zero caching, all in python-land (in comparison to the baked in and heavily optimized C of CPython). However, it should still be doable- algorithmic optimizations (avoiding generating semantically nonsensical ASTs, prioritizing ASTs that suit your goal, etc.) will have a larger impact than the constant-factor performance hit.
from pythonds.basic.stack import Stack
rStack = Stack()
def toStr(n,base):
convertString = "0123456789ABCDEF"
while n > 0:
if n < base:
rStack.push(convertString[n])
else:
rStack.push(convertString[n % base])
n = n // base
res = ""
while not rStack.isEmpty():
res = res + str(rStack.pop())
return res
print(toStr(1345,2))
I'm referring to this tutorial and also pasted the code above. The tutorial says the function is recursive but I don't see a recursive call anywhere, just a while loop. What am I missing?
You are right that this particular function is not recursive. However, the context is, that on the previous slide there was a recursive function, and in this one they want to show a glimpse of how it behaves internally. They later say:
The previous example [i.e. the one in question - B.] gives us some insight into how Python implements a recursive function call.
So, yes, the title is misleading, it should be rather Expanding a recursive function or Imitating recursive function behavior with a stack or something like this.
One may say that this function employs a recursive approach/strategy in some sense, to the problem being solved, but is not recursive itself.
A recursive algorithm, by definition, is a method where the solution to a problem depends on solutions to smaller instances of the same problem.
Here, the problem is to convert a number to a string in a given notation.
The "stockpiling" of data the function does actually looks like this:
push(d1)
push(d2)
...
push(dn-1)
push(dn)
res+=pop(dn)
res+=pop(dn-1)
...
res+=pop(d2)
res+=pop(d1)
which is effectively:
def pushpop():
push(dx)
pushpop(dx+1...dn)
res+=pop(dx)
I.e. a step that processes a specific chunk of data encloses all the steps that process the rest of the data (with each chunk processed in the same way).
It can be argued if the function is recursive (since they tend to apply the term to subroutines in a narrower sense), but the algorithm it implements definitely is.
For you to better feel the difference, here's an iterative solution to the same problem:
def toStr(n,base):
charmap = "0123456789ABCDEF"
res=''
while n > 0:
res = charmap[n % base] + res
n = n // base
return res
As you can see, this method has much lower memory footprint as it doesn't stockpile tasks. This is the difference: an iterative algorithm performs each step using the same instance of the state by mutating it while a recursive one creates a new instance for each step, necessarily stockpiling them if the old ones are still needed.
Because you're using a stack structure.
If you consider how function calling is implemented, recursion is essentially an easy way to get the compiler to manage a stack of invocations for you.
This function does all the stack handling manually, but it is still conceptually a recursive function, just one where the stack management is done manually instead of letting the compiler do it.
I have implemented LFU cache in python with the help of Priority Queue Implementation given at
https://docs.python.org/2/library/heapq.html#priority-queue-implementation-notes
I have given code in the end of the post.
But I feel that code has some serious problems:
1. To give a scenario, suppose there is only one page is continuously getting visited (say 50 times). But this code will always mark the already added node as "removed" and add it to heap again. So basically it will have 50 different nodes for the same page. Hence increasing heap size enormously.
2. This question is almost similar to Q1 of Telephonic Interview of
http://www.geeksforgeeks.org/flipkart-interview-set-2-sde-2/
And the person mentioned that doubly linked list can give better efficiency as compared to heap. Can anyone explain me, how?
from llist import dllist
import sys
from heapq import heappush, heappop
class LFUCache:
heap = []
cache_map = {}
REMOVED = "<removed-task>"
def __init__(self, cache_size):
self.cache_size = cache_size
def get_page_content(self, page_no):
if self.cache_map.has_key(page_no):
self.update_frequency_of_page_in_cache(page_no)
else:
self.add_page_in_cache(page_no)
return self.cache_map[page_no][2]
def add_page_in_cache(self, page_no):
if (len(self.cache_map) == self.cache_size):
self.delete_page_from_cache()
heap_node = [1, page_no, "content of page " + str(page_no)]
heappush(self.heap, heap_node)
self.cache_map[page_no] = heap_node
def delete_page_from_cache(self):
while self.heap:
count, page_no, page_content = heappop(self.heap)
if page_content is not self.REMOVED:
del self.cache_map[page_no]
return
def update_frequency_of_page_in_cache(self, page_no):
heap_node = self.cache_map[page_no]
heap_node[2] = self.REMOVED
count = heap_node[0]
heap_node = [count+1, page_no, "content of page " + str(page_no)]
heappush(self.heap, heap_node)
self.cache_map[page_no] = heap_node
def main():
cache_size = int(raw_input("Enter cache size "))
cache = LFUCache(cache_size)
while 1:
page_no = int(raw_input("Enter page no needed "))
print cache.get_page_content(page_no)
print cache.heap, cache.cache_map, "\n"
if __name__ == "__main__":
main()
Efficiency is a tricky thing. In real-world applications, it's often a good idea to use the simplest and easiest algorithm, and only start to optimize when that's measurably slow. And then you optimize by doing profiling to figure out where the code is slow.
If you are using CPython, it gets especially tricky, as even an inefficient algorithm implemented in C can beat an efficient algorithm implemented in Python due to the large constant factors; e.g. a double-linked list implemented in Python tends to be a lot slower than simply using the normal Python list, even for cases where in theory it should be faster.
Simple algorithm:
For an LFU, the simplest algorithm is to use a dictionary that maps keys to (item, frequency) objects, and update the frequency on each access. This makes access very fast (O(1)), but pruning the cache is slower as you need to sort by frequency to cut off the least-used elements. For certain usage characteristics, this is actually faster than other "smarter" solutions, though.
You can optimize for this pattern by not simply pruning your LFU cache to the maximum length, but to prune it to, say, 50% of the maximum length when it grows too large. That means your prune operation is called infrequently, so it can be inefficient compared to the read operation.
Using a heap:
In (1), you used a heap because that's an efficient way of storing a priority queue. But you are not implementing a priority queue. The resulting algorithm is optimized for pruning, but not access: You can easily find the n smallest elements, but it's not quite as obvious how to update the priority of an existing element. In theory, you'd have to rebalance the heap after every access, which is highly inefficient.
To avoid that, you added a trick by keeping elements around even if they are deleted. But this trades in space for time.
If you don't want to trade in time, you could update the frequencies in-place, and simply rebalance the heap before pruning the cache. You regain fast access times at the expense of slower pruning time, like the simple algorithm above. (I doubt there is any speed difference between the two, but I have not measured this.)
Using a double-linked list:
The double-linked list mentioned in (2) takes advantage of the nature of the possible changes here: An element is either added as the lowest priority (0 accesses), or an existing element's priority is incremented exactly by 1. You can use these attributes to your advantage if you design your data structures like this:
You have a double-linked list of elements which is ordered by the frequency of the elements. In addition, you have a dictionary that maps items to elements within that list.
Accessing an element then means:
Either it's not in the dictionary, that is, it's a new item, in which case you can simply append it to the end of the double-linked list (O(1))
or it's in the dictionary, in which case you increment the frequency in the element and move it leftwards through the double-linked list until the list is ordered again (O(n) worst-case, but usually closer to O(1)).
To prune the cache, you simply cut off n elements from the end of the list (O(n)).
I am trying to calculate the length of a list. When I run it on cmd, I get:
RuntimeError: maximum recursion depth exceeded in comparison
I don't think there's anything wrong with my code:
def len_recursive(list):
if list == []:
return 0
else:
return 1 + len_recursive(list[1:])
Don't use recursion unless you can predict that it is not too deep. Python has quite small limit on recursion depth.
If you insist on recursion, the efficient way is:
def len_recursive(lst):
if not lst:
return 0
return 1 + len_recursive(lst[1::2]) + len_recursive(lst[2::2])
The recursion depth in Python is limited, but can be increased as shown in this post. If Python had support for the tail call optimization, this solution would work for arbitrary-length lists:
def len_recursive(lst):
def loop(lst, acc):
if not lst:
return acc
return loop(lst[1:], acc + 1)
return loop(lst, 0)
But as it is, you will have to use shorter lists and/or increase the maximum recursion depth allowed.
Of course, no one would use this implementation in real life (instead using the len() built-in function), I'm guessing this is an academic example of recursion, but even so the best approach here would be to use iteration, as shown in #poke's answer.
As others have explained, there are two problems with your function:
It's not tail-recursive, so it can only handle lists as long as sys.getrecursionlimit.
Even if it were tail-recursive, Python doesn't do tail recursion optimization.
The first is easy to solve. For example, see Óscar López's answer.
The second is hard to solve, but not impossible. One approach is to use coroutines (built on generators) instead of subroutines. Another is to not actually call the function recursively, but instead return a function with the recursive result, and use a driver that applies the results. See Tail Recursion in Python by Paul Butler for an example of how to implement the latter, but here's what it would look like in your case.
Start with Paul Butler's tail_rec function:
def tail_rec(fun):
def tail(fun):
a = fun
while callable(a):
a = a()
return a
return (lambda x: tail(fun(x)))
This doesn't work as a decorator for his case, because he has two mutually-recursive functions. But in your case, that's not an issue. So, using Óscar López's's version:
#tail_rec
def tail_len(lst):
def loop(lst, acc):
if not lst:
return acc
return lambda: loop(lst[1:], acc + 1)
return lambda: loop(lst, 0)
And now:
>>> print tail_len(range(10000))
10000
Tada.
If you actually wanted to use this, you might want to make tail_rec into a nicer decorator:
def tail_rec(fun):
def tail(fun):
a = fun
while callable(a):
a = a()
return a
return functools.update_wrapper(lambda x: tail(fun(x)), fun)
Imagine you're running this using a stack of paper. You want to count how many sheets you have. If someone gives you 10 sheets you take the first sheet, put it down on the table and grab the next sheet, placing it next to the first sheet. You do this 10 times and your desk is pretty full, but you've set out each sheet. You then start to count every page, recycling it as you count it up, 0 + 1 + 1 + ... => 10. This isn't the best way to count pages, but it mirrors the recursive approach and python's implementaion.
This works for small numbers of pages. Now imagine someone gives you 10000 sheets. Pretty soon there is no room on your desk to set out each page. This is essentially what the error message is telling you.
The Maximum Recursion depth is "how many sheets" can the table hold. Each time you call python needs to keep the "1 + result of recursive call" around so that when all the pages have been laid out it can come back and count them up. Unfortunately you're running out of space before the final counting-up occurs.
If you want to do this recursively to learn, since you're want to use len() in any reasonable situation, just use small lists, 25 should be fine.
Some systems could handle this for large lists if they support tail calls
Your exception message means that your method is called recursively too often, so it’s likely that your list is just too long to count the elements recursively like that. You could do it simply using a iterative solution though:
def len_iterative(lst):
length = 0
while lst:
length += 1
lst = lst[1:]
return length
Note that this will very likely still be a terrible solution as lst[1:] will keep creating copies of the list. So you will end up with len(lst) + 1 list instances (with lengths 0 to len(lst)). It is probably the best idea to just use the built-in len directly, but I guess it was an assignment.
Python isn't optimising tail recursion calls, so using such recursive algorythms isn't a good idea.
You can tweak stack with sys.setrecursionlimit(), but it's still not a good idea.
what are my options there? I need to call a lot of appends (to the right end) and poplefts (from the left end, naturally), but also to read from the middle of the storage, which will steadily grow, by the nature of the algorithm. I would like to have all these operations in O(1).
I could implement it in C easy enough on circularly-addressed array (what's the word?) which would grow automatically when it's full; but what about Python? Pointers to other languages are appreciated too (I realize the "collections" tag is more Java etc. oriented and would appreciate the comparison, but as a secondary goal).
I come from a Lisp background and was amazed to learn that in Python removing a head element from a list is an O(n) operation. A deque could be an answer except the documentation says access is O(n) in the middle. Is there anything else, pre-built?
You can get an amortized O(1) data structure by using two python lists, one holding the left half of the deque and the other holding the right half. The front half is stored reversed so the left end of the deque is at the back of the list. Something like this:
class mydeque(object):
def __init__(self):
self.left = []
self.right = []
def pushleft(self, v):
self.left.append(v)
def pushright(self, v):
self.right.append(v)
def popleft(self):
if not self.left:
self.__fill_left()
return self.left.pop()
def popright(self):
if not self.right:
self.__fill_right()
return self.right.pop()
def __len__(self):
return len(self.left) + len(self.right)
def __getitem__(self, i):
if i >= len(self.left):
return self.right[i-len(self.left)]
else:
return self.left[-(i+1)]
def __fill_right(self):
x = len(self.left)//2
self.right.extend(self.left[0:x])
self.right.reverse()
del self.left[0:x]
def __fill_left(self):
x = len(self.right)//2
self.left.extend(self.right[0:x])
self.left.reverse()
del self.right[0:x]
I'm not 100% sure if the interaction between this code and the amortized performance of python's lists actually result in O(1) for each operation, but my gut says so.
Accessing the middle of a lisp list is also O(n).
Python lists are array lists, which is why popping the head is expensive (popping the tail is constant time).
What you are looking for is an array with (amortised) constant time deletions at the head; that basically means that you are going to have to build a datastructure on top of list that uses lazy deletion, and is able to recycle lazily-deleted slots when the queue is empty.
Alternatively, use a hashtable, and a couple of integers to keep track of the current contiguous range of keys.
Python's Queue Module may help you, although I'm not sure if access is O(1).