Workarounds to suspend (serialize) and resume a recursive generator stack?

Workarounds to suspend (serialize) and resume a recursive generator stack? - python

I have a recursive generator function that creates a tree of ChainMap contexts, and finally does something with the context at the end of the tree. It looks like this (parent_context is a ChainMap, hierarchy is a list):
def recursive_generator(parent_context, hierarchy):
next_level = hierarchy[0]
next_level_contexts = get_contexts(next_level) # returns a list of dicts
for context in next_level_contexts:
child_context = parent_context.new_child().update(context)
if next_level == hierarchy[-1]:
yield do_something(**child_context)
else:
yield from recursive_generator(child_context, hierarchy[1:])
Now I'd like to flag one level of the hierarchy such that the operation suspends after finishing that level, serializes the state to disk to be picked up later where it left off. Is there a way to do this without losing the elegance of the recursion?
I know that you can't pickle generators, so I thought about refactoring into an iterator object. But I think yield from is necessary for the recursion here (edit: at least without some tedious management of the stack), so I think it needs to be a generator, no? Is there a workaround for this?

you seem to be exploring a tree with DFS. so you could construct the tree in memory and make the DFS explicit. then just store the tree and restart at the left-most node (i think?).
that's effectively "tedious management of the stack", but it has a nice picture that would help implement it (at least for me, looking at your problem as DFS of a tree makes the implementation seem fairly obvious - before i thought of it like that, it seemed quite complicated - but i may be missing something).
sorry if that's obvious and insufficient...
[edit]
class Inner:
def __init__(self, context, hierarchy):
self.children = []
next_level = hierarchy[0]
next_level_contexts = get_contexts(next_level)
for context in next_level_contexts:
child_context = parent_context.new_child().update(context)
if next_level == hierarchy[-1]:
self.children.append(Leaf(context))
else:
self.children.append(Inner(child_context, hierarchy[1:]))
def do_something(self):
# this will do something on the left-most leaf
self.children[0].so_something()
def prune(self):
# this will remove the left-most leaf
if isinstance(self.children[0], Leaf):
self.children.pop(0)
else:
self.children[0].prune()
if not self.children[0]:
self.children.pop(0)
def __bool__(self):
return bool(self.children)
class Leaf:
def __init__(self, context):
self.context = context
def do_something():
do_something(**self.context)
the code above hasn't been tested. i ended up using classes for nodes as a tuple seemed too confusing. you create the tree by creating the parent node. then you can "do something" by calling do_something, after which you will want to remove the "done" leaf with prune:
tree = Inner(initial_context, initial_hierarchy)
while tree:
tree.do_something()
tree.prune()
i am pretty sure it will contain bugs, but hopefully it's enough to show the idea. sorry i can't do more but i need to repot plants....
ps it's amusing that you can write code with generators, but didn't know what DFS was. you might enjoy reading the "algorithm design manual" - it's part textbook and part reference, and it doesn't treat you like an idiot (i too have no formal education in computer science, and i thought it was a good book).
[edited to change to left-most first, which is what you had before, i think]
and alko has a good point...

Here's what I ended up doing:
def recursive_generator(parent_context, hierarchy):
next_level = hierarchy[0]
next_level_contexts = get_contexts(next_level) # returns a list of dicts
for context in next_level_contexts:
child_context = parent_context.new_child().update(context)
if next_level == hierarchy[-1]:
yield child_context
else:
yield from recursive_generator(child_context, hierarchy[1:])
def traverse_tree(hierarchy):
return list(recursive_generator(ChainMap(), hierarchy)
def do_things(contexts, start, stop):
for context in contexts[start:stop]:
yield do_something(**context)
Then I can pickle the list returned by traverse_tree and later load it and run it in pieces with do_things. This is all in a class with a lot more going on of course, but this gets to the gist of it.

Related

searching python object hierarchy

I'm using an unstable python library that's undergoing changes to the API in its various git submodules. Every few weeks, some member or method's location will get changed or renamed. For example, the expression
vk.rinkront[0].asyncamera.printout()
that worked a few weeks back, now has the asyncamera member located in
vk.toplink.enabledata.tx[0].asyncamera[0].print()
I manage to figure out the method's new location by greping, git diffing, ipython autocomplete, and generally bumbling my way around. This process is painful, because the repository has been quite abstracted and the member names observable from the python shell do not necessarily appear in the git submodule code.
Is there a python routine that performs a graph traversal of the object hierarchy while checking for keywords (e.g. print ) ?
(if no, I'll hack some bfs/dfs that checks the child node type before pushing/appending the __dict__, array, dir(). etc... contents onto a queue/stack. But some one must have come across this problem before and come up with a more elegant solution.)
-------EDITS----------
to sum up, is there an existing library such that the following code would work?
import unstable_library
import object_inspecter.search
vk = unstable_library.create() #initializing vk
object_inspecter.search(vk,"print") #searching for lost method

Oy, I feel for you... You probably should just lock down on a version, use it until the API becomes stable, and then upgrade later. That's why complex projects focus on dependency version control.
There is a way, but it's not really standard. The dir() built-in function returns a list of string names of all attributes and methods of a class. With a little bit a wrangling, you could write a script that recursively digs down.
The ultimate issue, though, is that you're going to run into infinite recursive loops when you try to investigate class members. You'll need to add in smarts to either recognize the pattern or limit the depth of your searching.

you can use fairly simple graph traversal to do this
def find_thing(ob,pattern,seen=set(),name=None,curDepth=0,maxDepth=99,isDict=True):
if(ob is None):
return []
if id(ob) in seen:
return []
seen.add(id(ob))
name = str(name or getattr(ob,"__name__",str(ob)))
base_case = check_base_cases(ob,name,pattern)
if base_case is not None:
return base_case
if(curDepth>=maxDepth):
return []
return recursive_step(ob,pattern,name=name,curDepth=curDepth,isDict=isDict)
now just define your two steps (base_case and recursive_step)... something like
def check_base_cases(ob,name,pattern):
if isinstance(ob,str):
if re.match(pattern,ob):
return [ob]
else:
return []
if isinstance(ob,(int,float,long)):
if ob == pattern or str(ob) == pattern:
return [ob]
else:
return []
if isinstance(ob,types.FunctionType):
if re.match(pattern,name):
return [name]
else:
return []
def recursive_step(ob,pattern,name,curDepth,isDict=True):
matches = []
if isinstance(ob,(list,tuple)):
for i,sub_ob in enumerate(ob):
matches.extend(find_thing(sub_ob,pattern,name='%s[%s]'%(name,i),curDepth=curDepth+1))
return matches
if isinstance(ob,dict):
for key,item in ob.items():
if re.match(pattern,str(key)):
matches.append('%s.%s'%(name,key) if not isDict else '%s["%s"]'%(name,key))
else:
matches.extend(find_thing(item,pattern,
name='%s["%s"]'%(name,key) if isDict else '%s.%s'%(name,key),
curDepth=curDepth+1))
return matches
else:
data = dict([(x,getattr(ob,x)) for x in dir(ob) if not x.startswith("_")])
return find_thing(data,pattern,name=name,curDepth=curDepth+1,isDict=False)
finally you can test it like so
print(find_thing(vk,".*print.*"))
I used the following Example
class vk(object):
class enabledata:
class camera:
class object2:
def printX(*args):
pass
asynccamera = [object2]
tx = [camera]
toplink = {'enabledata':enabledata}
running the following
print( find_thing(vk,'.*print.*') )
#['vk.toplink["enabledata"].camera.asynccamera[0].printX']

Algorithms to achieve reasonable result with most economic efforts

Suppose that there are six kinds of problems to be handled when reading a book,
I illustrate the details as follow:
while True:
if encounter A :
handle A
#during handling the problem, it might spawn new problems of
#A, B, C, D, E,
produce (A,B..E or null)
continue
if B occurs:
handle B
#during hanlding the problem, it might spwan new problems
#A, B, C, D, E,
produce (A,B..E or null)
continue
if C happens:
handle C
produce (A,B..E or null)
continue
...
if there are no problmes:
break
Assume I have 3 problems,
the above program might endless loop on first one and never touch the second.
Take an example that I am reading a book,
'Problem A' is defined as encounter a 'new word', handle it is to look up dictionary.
When looking up, I might come acorss another new word, another and another.
In this case, I will never end up reading one sentence of a book.
As a solution,
I introduce a container to collect problems, value-weighted them then determine to execute which one.
def solve_problems(problems)
problem_backet = list(problems)
while True:
if problem_backet not is null:
#value-weighted all the problem
#and determine one to implement
value_weight problems
problem = x
if problem == A:
handle A
problem_backet.append(new_problem)
continue
if problem == B:
handle B
problem_backet.append(new_problem)
continue
...
if problem_backet is null:
return
I tried alternatively to seek inspiration and improve efficiency.
def solve_problems(problems):
global problem_backet
problem_backet = list(problems)
value_weight problems
problem = x
if problem == A:
handle A
problem_backet.append(new_problem)
solve_problems(problem_backet)
if problem == B:
handle B
problem_backet.append(new_problem)
solve_problems(problem_backet)
if problem == C:
handle C
problem_backet.append(new_problem)
solve_problems(problem_backet)
...
if problem_backet is null:
return
Again, the value_weighted process consume huge efforts and time.
How to solve such a problem in a proper algorithms?

'Problem A' is defined as encounter a 'new word', handle it is to look up dictionary.
When looking up, I might come across another new word, another and another.
In this case, I will never end up reading one sentence of a book.
Looks like it will eventually end up reading the sentence, since the number of new words is limited by dictionary size. In general it sounds OK to me, unless there are some other restrictions not explicitly mentioned, like finish sentence reading in a limited time.
How to solve such a problem in a proper algorithms?
Well, if there is no a "limited time" restriction, your original algorithm is almost perfect. To make it even better in therms of overall performance, we might handle all problems A first, then move to B and so on. It will increase the data locality and overall performance of our algorithm.
But if there is a "limited time" restriction, we can end up reading full sentence in that time (without full understanding) or end up reading part of the sentence (fully understanding that part) or something in between (like suggested by #Lauro Bravar).
From the example above it is not quite clear how we do the value_weight, but the proper name for this kind of problems is Priority Queueing. There are variety of algorithms and implementations, please have a look at Wikipedia page for the details: https://en.wikipedia.org/wiki/Priority_queue

You can do several things to approach such a problem.
One of them is to set a value of "max-iterations" or "max-effort" in a Machine Learning style that you can invest into reading a book. Therefore you will execute (handle) only up to a number of actions, until the limit has been reached. This solution will look like:
while(effortRemaining > 0){
# Do your actions
}
The actions you do should be the ones that report more benefit/less effort according to some metric that you need to define.
When you perform a certain action (handle) you substract the cost/effort of that action from effortRemaining and you continue with your flow.

You have already the algorithm (with the help of Andriy for the priority queue), but you lack the design. When I see your multple ifs that check the type of the problem, I think to polymorphism.
Why not try OOP? You have two objects to define: a problem and a priority queue. Fortunately, the priority queue is defined in the heapq module. Let's focus on the problem:
in its core definition, it is handled and may be compared to other problems (it is more or less urgent). Note that, guided by the OOP principles, I do not talk of the structure or implementation of a problem,
but only of the functions of a problem:
class Problem()
def handle(self, some args): # we may need a dictionary, a database connection, ...
...
def compare(self, other):
...
But you said that when a problem is handled, it may add new problems to the queue. So let's add a precision to the definition of handle:
def handle(self, queue, some args): # we still may need a dictionary, a database connection, ...
...
In Python, compare is a special method named __lt__, for "lower than". (You have other special comparison methods, but __lt__ will be sufficient here.)
Here's a basic implementation example:
class Problem():
def __init__(self, name, weight):
self.__name = name
self.__weight = weight
def handle(self, queue):
print ("handle the problem {}".format(self))
while random.random() > 0.75: # randomly add new problems for the example
new_problem = Problem(name*2, random.randint(0, 100))
print ("-> Add the problem {} to the queue".format(new_problem))
heapq.heappush(queue, new_problem) # add the problem to the queue
def __lt__(self, other):
return self.__weight > other.__weight # note the >
def __repr__(self): # to show in lists
return "Problem({}, {})".format(self.__name, self.__weight)
Wait! Why "lower than" and a >? That's because the module heapq is a min-heap: it returns the smallest element first. Thus, we define the big weights as smaller than the little weights.
Now, we can build a begin queue with fake data for the example:
queue = []
for name in ["foo", "bar", "baz"]:
problem = Problem(name, random.randint(0, 100))
heapq.heappush(queue, problem) # add the problem to the queue
And run the main loop:
while queue:
print ("Current queue", queue)
problem = heapq.heappop(queue) # the problem with the max weight in O(lg n)
problem.handle(queue)
I guess you will be able to subclass the Problem class to represent the various problems you might want to handle.

Python define variable as "load file at first use"

Python beginner here. I currently have some code that looks like
a=some_file_reading_function('filea')
b=some_file_reading_function('fileb')
# ...
if some_condition(a):
do_complicated_stuff(b)
else:
# nothing that involves b
What itches me is that the load of 'fileb' may not be necessary, and it has some performance penalty. Ideally, I would load it only if b is actually required later on. OTOH, b might be used multiple times, so if it is used once, it should load the file once for all. I do not know how to achieve that.
In the above pseudocode, one could trivially bring the loading of 'fileb' inside the conditional loop, but in reality there are more than two files and the conditional branching is quite complex. Also the code is still under heavy development and the conditional branching may change.
I looked a bit at either iterators or defining a class, but (probably due to my inexperience) could not make either work. The key problem I met was to load the file zero times if unneeded, and only once if needed. I found nothing on searching, because "how to load a file by chunks" pollutes the results for "file lazy loading" and similar queries.
If needed: Python 3.5 on Win7, and some_file_reading_function returns 1D- numpy.ndarray 's.

class LazyFile():
def __init__(self, file):
self.file = file
self._data = None
#property # so you can use .data instead of .data()
def data(self):
if self._data is None: # if not loaded
self._data = some_file_reading_function(self.file) #load it
return self._data
a = LazyFile('filea')
b = LazyFile('fileb')
if some_condition(a.data):
do_complicated_stuff(b.data)
else:
# other stuff

Actually, just found a workaround with classes. Try/except inspired by How to know if an object has an attribute in Python. A bit ugly, but does the job:
class Filecontents:
def __init__(self,filepath):
self.fp = filepath
def eval(self):
try:
self.val
except AttributeError:
self.val = some_file_reading_function(self.fp)
print(self.txt)
finally:
return self.val
def some_file_reading_function(fp):
# For demonstration purposes: say that you are loading something
print('Loading '+fp)
# Return value
return 0
a=Filecontents('somefile')
print('Not used yet')
print('Use #1: value={}'.format(a.eval()))
print('Use #2: value={}'.format(a.eval()))
Not sure that is the "best" (prettiest, most Pythonic) solution though.

Pass list by reference in python through recursion

I am using a recursive function call to traverse a tree and I want to add the locations of valuable nodes to a master list. My current method is to use a global. How do I pass this list by reference instead (or solve this in another way without globals)
hcList = []
def expand(node):
global hcList
if node.hasTreasure():
hcList.append(node)
if not node.end():
expand(node.next())
global hcList
expand(startnode)
hcList.filter()
Anyway to do something like below without using a hairy global? My actual code is much messier with globals but the concepts are the same. The code below doesn't work the way I want it to. Namely, hcList is empty.
def expand(node, hcList):
if node.hasTreasure():
hcList.append(node)
if not node.end():
expand(node.next(), hcList)
hcList = []
expand(startnode, hcList)
hcList.filter()

For recursion, it is frequently simpler to return the new value
def expand(node, hcList):
if node.hasTreasure:
hcList.append(node)
if node.end():
return hcList
return expand(node.next(), hcList)
hcList = expand(startnode, [])
hcList.filter() # not sure why this was in the OP
If your list is very deep, you may have a lot on the stack, but good tail recursion can optimize that away.

Python loop | "do-while" over a tree

Is there a more Pythonic way to put this loop together?:
while True:
children = tree.getChildren()
if not children:
break
tree = children[0]
UPDATE:
I think this syntax is probably what I'm going to go with:
while tree.getChildren():
tree = tree.getChildren()[0]

children = tree.getChildren()
while children:
tree = children[0]
children = tree.getChildren()
It would be easier to suggest something if I knew what kind of collection api you're working with. In a good api, you could probably do something like
while tree.hasChildren():
children = tree.getChildren()
tree = children[0]

(My first answer suggested to use iter(tree.getChildren, None) directly, but that won't work as we are not calling the same tree.getChildren function all the time.)
To fix this up I propose a solution using lambda's non-binding of its variables as a possible workaround. I think at this point this solution is not better than any other previously posted:
You can use iter() in it's second sentinel form, using lamda's strange binding:
for children in iter((lambda : tree.getChildren()), None):
tree = children[0]
(Here it assumes getChildren() returns None when there are no children, but it has to be replaced with whatever value it returns ([]?).)
iter(function, sentinel) calls function repeatedly until it returns the sentinel value.

Do you really only want the first branch? I'm gonna assume you don't and that you want the whole tree. First I'd do this:
def allitems(tree):
for child in tree.getChildren():
yield child
for grandchild in allitems(child):
yield grandchild
This will go through the whole tree. Then you can just:
for item in allitems(tree):
do_whatever_you_want(item)
Pythonic, simple, clean, and since it uses generators, will not use much memory even for huge trees.

I think the code you have is fine. If you really wanted to, you could wrap it all up in a try/except:
while True:
try:
tree = tree.getChildren()[0]
except (IndexError, TypeError):
break
IndexError will work if getChildren() returns an empty list when there are no children. If it returns False or 0 or None or some other unsubscriptable false-like value, TypeError will handle the exception.
But that's just another way to do it. Again, I don't think the Pythonistas will hunt you down for the code you already have.

Without further testing, I believe this should work:
try: while True: tree=tree.getChildren()[0]
except: pass
You might also want to override the __getitem__() (the brackets operator) in the Tree class, for further neatification.
try: while True: tree=tree[0]
except: pass

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Workarounds to suspend (serialize) and resume a recursive generator stack? - python

Related

searching python object hierarchy

Algorithms to achieve reasonable result with most economic efforts

Python define variable as "load file at first use"

Pass list by reference in python through recursion

Python loop | "do-while" over a tree

Categories

Resources