Pass list by reference in python through recursion - python

I am using a recursive function call to traverse a tree and I want to add the locations of valuable nodes to a master list. My current method is to use a global. How do I pass this list by reference instead (or solve this in another way without globals)
hcList = []
def expand(node):
global hcList
if node.hasTreasure():
hcList.append(node)
if not node.end():
expand(node.next())
global hcList
expand(startnode)
hcList.filter()
Anyway to do something like below without using a hairy global? My actual code is much messier with globals but the concepts are the same. The code below doesn't work the way I want it to. Namely, hcList is empty.
def expand(node, hcList):
if node.hasTreasure():
hcList.append(node)
if not node.end():
expand(node.next(), hcList)
hcList = []
expand(startnode, hcList)
hcList.filter()

For recursion, it is frequently simpler to return the new value
def expand(node, hcList):
if node.hasTreasure:
hcList.append(node)
if node.end():
return hcList
return expand(node.next(), hcList)
hcList = expand(startnode, [])
hcList.filter() # not sure why this was in the OP
If your list is very deep, you may have a lot on the stack, but good tail recursion can optimize that away.

Related

Definining `fac` with generators. And: Why no stack overflow with generators?

Is there a way we can define the following code (a classic example for recursion) via generators in Python? I am using Python 3.
def fac(n):
if n==0:
return 1
else:
return n * fac(n-1)
I tried this, no success:
In [1]: def fib(n):
...: if n == 0:
...: yield 1
...: else:
...: n * yield (n-1)
File "<ipython-input-1-bb0068f2d061>", line 5
n * yield (n-1)
^
SyntaxError: invalid syntax
Classic recursion in Python leads to Stack Overflow
This classic example leads to a stack overflow on my machine for an input of n=3000. In the Lisp dialect "Scheme" I'd use tail recursion and avoid stack overflow. Not possible in Python. That's why generators come in handy in Python. But I wonder:
Why no stack overflow with generators?
Why is there no stack overflow with generators in Python? How do they work internally? Doing some research leads me always to examples showing how generators are used in Python, but not much about the inner workings.
Update 1: yield from my_function(...)
As I tried to explain in the comments secion, maybe my example above was a poor choice for making a point. My actual question was targeted at the inner workings of generators used recursively in yield from statements in Python 3.
Below is an (incomplete) example code that I use to proces JSON files generatred by Firebox bookmark backups. At several points I use yield from process_json(...) to recursively call the function again via generators.
Exactly in this example, how is stack overflow avoided? Or is it?
# (snip)
FOLDERS_AND_BOOKMARKS = {}
FOLDERS_DATES = {}
def process_json(json_input, folder_path=""):
global FOLDERS_AND_BOOKMARKS
# Process the json with a generator
# (to avoid recursion use generators)
# https://stackoverflow.com/a/39016088/5115219
# Is node a dict?
if isinstance(json_input, dict):
# we have a dict
guid = json_input['guid']
title = json_input['title']
idx = json_input['index']
date_added = to_datetime_applescript(json_input['dateAdded'])
last_modified = to_datetime_applescript(json_input['lastModified'])
# do we have a container or a bookmark?
#
# is there a "uri" in the dict?
# if not, we have a container
if "uri" in json_input.keys():
uri = json_input['uri']
# return URL with folder or container (= prev_title)
# bookmark = [guid, title, idx, uri, date_added, last_modified]
bookmark = {'title': title,
'uri': uri,
'date_added': date_added,
'last_modified': last_modified}
FOLDERS_AND_BOOKMARKS[folder_path].append(bookmark)
yield bookmark
elif "children" in json_input.keys():
# So we have a container (aka folder).
#
# Create a new folder
if title != "": # we are not at the root
folder_path = f"{folder_path}/{title}"
if folder_path in FOLDERS_AND_BOOKMARKS:
pass
else:
FOLDERS_AND_BOOKMARKS[folder_path] = []
FOLDERS_DATES[folder_path] = {'date_added': date_added, 'last_modified': last_modified}
# run process_json on list of children
# json_input['children'] : list of dicts
yield from process_json(json_input['children'], folder_path)
# Or is node a list of dicts?
elif isinstance(json_input, list):
# Process children of container.
dict_list = json_input
for d in dict_list:
yield from process_json(d, folder_path)
Update 2: yield vs yield from
Ok, I get it. Thanks to all the comments.
So generators via yield create iterators. That has nothing to do with recursion, so no stack overflow here.
But generators via yield from my_function(...) are indeed recursive calls of my function, albeit delayed, and only evaluated if demanded.
This second example can indeed cause a stack overflow.
OK, after your comments I have completely rewritten my answer.
How does recursion work and why do we get a stack overflow?
Recursion is often an elegant way to solve a problem. In most programming languages, every time you call a function, all the information and state needed for the function a put on the stack - a so called "stack frame". The stack is a special per-thread memory region and limited in size.
Now recursive functions implicitly use these stack frames to store state/intermediate results. E.g., the factorial function is n * (n-1) * ((n-1) -1)... 1 and all these "n-1" are stored on the stack.
An iterative solution has to store these intermediate results explicitly in a variable (that often sits in a single stack frame).
How do generators avoid stack overflow?
Simply: They are not recursive. They are implemented like iterator objects. They store the current state of the computation and return a new result every time you request it (implicitly or with next()).
If it looks recursive, that's just syntactic sugar. "Yield" is not like return. It yields the current value and then "pauses" the computation. That's all wrapped up in one object and not in a gazillion stack frames.
This will give you a series from ´1 to n!´:
def fac(n):
if (n <= 0):
yield 1
else:
v = 1
for i in range(1, n+1):
v = v * i
yield v
There is no recursion, the intermediate results are stored in v which is most likely stored in one object (on the heap, probably).
What about yield from
OK, that's interesting, since that was only added in Python 3.3.
yield from can be used to delegate to another generator.
You gave an example like:
def process_json(json_input, folder_path=""):
# Some code
yield from process_json(json_input['children'], folder_path)
This looks recursive, but instead it's a combination of two generator objects. You have your "inner" generator (which only uses the space of one object) and with yield from you say "I'd like to forward all the values from that generator to my caller".
So it doesn't generate one stack frame per generator result, instead it creates one object per generator used.
In this example, you are creating one generator object per child JSON-object. That would probably be the same number of stack frames needed if you did it recursively. You won't see a stack overflow though, because objects are allocated on the heap and you have a very different size limit there - depending on your operating system and settings. On my laptop, using Ubuntu Linux, ulimit -s gives me 8 MB for the default stack size, while my process memory size is unlimited (although I have only 8GB of physical memory).
Look at this documentation page on generators: https://wiki.python.org/moin/Generators
And this QA: Understanding generators in Python
Some nice examples, also for yield from:
https://www.python-course.eu/python3_generators.php
TL;DR: Generators are objects, they don't use recursion. Not even yield from, which just delegates to another generator object. Recursion is only practical when the number of calls is bounded and small, or your compiler supports tail call optimization.

Static methods for recursive functions within a class?

I'm working with nested dictionaries on Python (2.7) obtained from YAML objects and I have a couple of questions that I've been trying to get an answer to by reading, but have not been successful. I'm somewhat new to Python.
One of the simplest functions is one that reads the whole dictionary and outputs a list of all the keys that exist in it. I use an underscore at the beginning since this function is later used by others within a class.
class Myclass(object):
#staticmethod
def _get_key_list(d,keylist):
for key,value in d.iteritems():
keylist.append(key)
if isinstance(value,dict):
Myclass._get_key_list(d.get(key),keylist)
return list(set(keylist))
def diff(self,dict2):
keylist = []
all_keys1 = self._get_key_list(self.d,keylist)
all_keys2 = self._get_key_list(dict2,keylist)
... # More code
Question 1: Is this a correct way to do this? I am not sure whether it's good practice to use a static method for this reason. Since self._get_key_list(d,keylist) is recursive, I dont want "self" to be the first argument once the function is recursively called, which is what would happen for a regular instance method.
I have a bunch of static methods that I'm using, but I've read in a lot of places thay they could perhaps not be good practice when used a lot. I also thought I could make them module functions, but I wanted them to be tied to the class.
Question 2: Instead of passing the argument keylist to self._get_key_list(d,keylist), how can I initialize an empty list inside the recursive function and update it? Initializing it inside would reset it to [] every time.
I would eliminate keylist as an explicit argument:
def _get_keys(d):
keyset = set()
for key, value in d.iteritems():
keylist.add(key)
if isinstance(value, dict):
keylist.update(_get_key_list(value))
return keyset
Let the caller convert the set to a list if they really need a list, rather than an iterable.
Often, there is little reason to declare something as a static method rather than a function outside the class.
If you are concerned about efficiency (e.g., getting lots of repeat keys from a dict), you can go back to threading a single set/list through the calls as an explicit argument, but don't make it optional; just require that the initial caller supply the set/list to update. To emphasize that the second argument will be mutated, just return None when the function returns.
def _get_keys(d, result):
for key, value in d.iteritems():
result.add(key)
if isinstance(value, dict):
_get_keys(value, result)
result = set()
_get_keys(d1, result)
_get_keys(d2, result)
# etc
There's no good reason to make a recursive function in a class a static method unless it is meant to be invoked outside the context of an instance.
To initialize a parameter, we usually assign to it a default value in the parameter list, but in case it needs to be a mutable object such as an empty list in this case, you need to default it to None and the initialize it inside the function, so that the list reference won't get reused in the next call:
class Myclass(object):
def _get_key_list(self, d, keylist=None):
if keylist is None:
keylist = []
for key, value in d.iteritems():
keylist.append(key)
if isinstance(value, dict):
self._get_key_list(d.get(key), keylist)
return list(set(keylist))
def diff(self, dict2):
all_keys1 = self._get_key_list(self.d)
all_keys2 = self._get_key_list(dict2)
... # More code

Changing the value of a Boolean Function in Python

I want to set the default value of a boolean function to be False and want to change it to True only for certain values of the input in between the code. Is there a way to do it?
I'm trying to write a simple DFS search code.
The code I'm using is this:
def visited(v):
return False
def explore(v):
visited(v) = True
for (v,w) in E:
if not visited(w):
explore(w)
A function is probably the wrong tool here. Instead, try a set:
def explore(v, visited=set()):
visited.add(v)
for (v,w) in E:
if w not in visited:
explore(w)
I'm using a sometimes unintuitive behavior of default arguments in Python for this example code because it's convenient, but you could also use a different way of maintaining a shared set, such as a wrapper function that initializes a blank set and then calls a recursive helper function. (That would let you explore multiple times by resetting the set each time.)
Assuming you have a myfunc function returning a boolean, that you want to modify the behaviour:
_myfunc = myfunc
def myfunc(*args):
if some_condition:
_myfunc(*args)
else:
return False
This way, you will trigger the actual function only in wished cases.
This solution overwrites the original name, but you are not obliged to do so.
No, you cannot set the return value of a function from outside the function. Instead, use a variable in the calling function.
For instance, here, you want to remember which nodes you visited. A set is good for remembering a set of objects.
def explore(v):
visited.add(v)
for (v,w) in E:
if w not in visited:
explore(w)
A couple of cautions about this:
If you call it twice, everything will be seen as already visited, because the state is tracked in a global. That's similar to what you already have but may or may not be what you want. If you want to be able to iterate twice, you need to pass this down as a parameter, and preferably add a second function that starts the recursion:
def explore(v):
return explore_down(v, set())
def explore_down(v, visited):
visited.add(v)
for (v,w) in E:
if w not in visited:
explore(w)
Also, depending on what type v and w are, you may get an error that they're not hashable, for which see this question.

Python: Return function won’t return a list

The following function prints installed apps from a server.
def getAppNames():
for app in service.apps:
print app.name
It works absolutely fine and it prints a list of installed apps like so:
App A
App B
App C
App D
App E
App F
However when I change the "print" to a "return", all I get is "App A". I have seen similar questions on this but I cant find a solution and have explored different methods. Basicly I require the return function like the print function, I would appreciate any help.
Thanks.
The return statement causes your function to immediately exit. From the documentation:
return leaves the current function call with the expression list (or
None) as return value.
The quick fix is to save the names in a temporary list, then return the list:
def getAppNames():
result = []
for app in service.apps:
result.append(app.name)
return result
Since this is such a common thing to do -- iterate over a list and return a new list -- python gives us a better way: list comprehensions.
You can rewrite the above like this:
def getAppNames:
return [app.name for app in service.apps]
This is considered a "pythonic" solution, which means it uses special features of the language to make common tasks easier.
Another "pythonic" solution involves the use of a generator. Creating a generator involves taking your original code as-is, but replacing return With yield. However, this affects how you use the function. Since you didn't show how you are using the function, I'll not show that example here since it might add more confusion than clarity. Suffice it to say there are more than two ways to solve your problem.
There are two solutions:
def getAppNames():
return [app.name for app in service.apps]
or
def getAppNames():
for app in service.apps:
yield app.name
Unlike return, yield will stay in the loop. This is called a "generator" in Python. You can then use list() to turn the generator into a list or iterate over it:
for name in getAppNames():
...
The advantage of the generator is that it doesn't have to build the whole list in memory.
You should read basic docs about Python :)
try this:
def getAppNames():
return [app.name for app in service.apps]
After a return statement the function "ends" - it leaves the function so your for loop actually does only a single iteration.
To return a list you can do -
return [app for app in service.apps]
or just -
return service.apps
The first time you hit a return command, the function returns which is why you only get one result.
Some options:
Use 'yield' to yield results and make the function act as an generator
Collect all the items in a list (or some other collection) and return that.

Workarounds to suspend (serialize) and resume a recursive generator stack?

I have a recursive generator function that creates a tree of ChainMap contexts, and finally does something with the context at the end of the tree. It looks like this (parent_context is a ChainMap, hierarchy is a list):
def recursive_generator(parent_context, hierarchy):
next_level = hierarchy[0]
next_level_contexts = get_contexts(next_level) # returns a list of dicts
for context in next_level_contexts:
child_context = parent_context.new_child().update(context)
if next_level == hierarchy[-1]:
yield do_something(**child_context)
else:
yield from recursive_generator(child_context, hierarchy[1:])
Now I'd like to flag one level of the hierarchy such that the operation suspends after finishing that level, serializes the state to disk to be picked up later where it left off. Is there a way to do this without losing the elegance of the recursion?
I know that you can't pickle generators, so I thought about refactoring into an iterator object. But I think yield from is necessary for the recursion here (edit: at least without some tedious management of the stack), so I think it needs to be a generator, no? Is there a workaround for this?
you seem to be exploring a tree with DFS. so you could construct the tree in memory and make the DFS explicit. then just store the tree and restart at the left-most node (i think?).
that's effectively "tedious management of the stack", but it has a nice picture that would help implement it (at least for me, looking at your problem as DFS of a tree makes the implementation seem fairly obvious - before i thought of it like that, it seemed quite complicated - but i may be missing something).
sorry if that's obvious and insufficient...
[edit]
class Inner:
def __init__(self, context, hierarchy):
self.children = []
next_level = hierarchy[0]
next_level_contexts = get_contexts(next_level)
for context in next_level_contexts:
child_context = parent_context.new_child().update(context)
if next_level == hierarchy[-1]:
self.children.append(Leaf(context))
else:
self.children.append(Inner(child_context, hierarchy[1:]))
def do_something(self):
# this will do something on the left-most leaf
self.children[0].so_something()
def prune(self):
# this will remove the left-most leaf
if isinstance(self.children[0], Leaf):
self.children.pop(0)
else:
self.children[0].prune()
if not self.children[0]:
self.children.pop(0)
def __bool__(self):
return bool(self.children)
class Leaf:
def __init__(self, context):
self.context = context
def do_something():
do_something(**self.context)
the code above hasn't been tested. i ended up using classes for nodes as a tuple seemed too confusing. you create the tree by creating the parent node. then you can "do something" by calling do_something, after which you will want to remove the "done" leaf with prune:
tree = Inner(initial_context, initial_hierarchy)
while tree:
tree.do_something()
tree.prune()
i am pretty sure it will contain bugs, but hopefully it's enough to show the idea. sorry i can't do more but i need to repot plants....
ps it's amusing that you can write code with generators, but didn't know what DFS was. you might enjoy reading the "algorithm design manual" - it's part textbook and part reference, and it doesn't treat you like an idiot (i too have no formal education in computer science, and i thought it was a good book).
[edited to change to left-most first, which is what you had before, i think]
and alko has a good point...
Here's what I ended up doing:
def recursive_generator(parent_context, hierarchy):
next_level = hierarchy[0]
next_level_contexts = get_contexts(next_level) # returns a list of dicts
for context in next_level_contexts:
child_context = parent_context.new_child().update(context)
if next_level == hierarchy[-1]:
yield child_context
else:
yield from recursive_generator(child_context, hierarchy[1:])
def traverse_tree(hierarchy):
return list(recursive_generator(ChainMap(), hierarchy)
def do_things(contexts, start, stop):
for context in contexts[start:stop]:
yield do_something(**context)
Then I can pickle the list returned by traverse_tree and later load it and run it in pieces with do_things. This is all in a class with a lot more going on of course, but this gets to the gist of it.

Categories

Resources