python recursion not working with yield - python

I made a tree data structure and a function which gives out all its leaves, but the recursive algorithm never seems to work for any of the child nodes. The function gets called once using the root node
def get_files(self, initials):
for child in self.children:
name = initials + os.sep + child.name
if child.children == []:
yield name
else:
child.get_files(name)
full class: https://pastebin.com/4eukaVWx

if child.children == []:
yield name
else:
child.get_files(name)
Here you're yielding only in the if. In the other branch, the data is lost. You need to yield the elements returned by child.get_files(name). I'd do:
if not child.children:
yield name
else:
yield from child.get_files(name)
yield from is available in "recent" python versions. An alternative for older versions is a loop:
for item in child.get_files(name):
yield item
(a similar issue happens a lot with functions: Why does my function return None?)

Not a solution but an observation:
I guess you are printing something in the pastebin code and you trimmed down the print statement just to post a mve on the question. It works completely fine without the print statements but as soon as you put a single print statement in the method, the recursion stops happening.

Related

searching python object hierarchy

I'm using an unstable python library that's undergoing changes to the API in its various git submodules. Every few weeks, some member or method's location will get changed or renamed. For example, the expression
vk.rinkront[0].asyncamera.printout()
that worked a few weeks back, now has the asyncamera member located in
vk.toplink.enabledata.tx[0].asyncamera[0].print()
I manage to figure out the method's new location by greping, git diffing, ipython autocomplete, and generally bumbling my way around. This process is painful, because the repository has been quite abstracted and the member names observable from the python shell do not necessarily appear in the git submodule code.
Is there a python routine that performs a graph traversal of the object hierarchy while checking for keywords (e.g. print ) ?
(if no, I'll hack some bfs/dfs that checks the child node type before pushing/appending the __dict__, array, dir(). etc... contents onto a queue/stack. But some one must have come across this problem before and come up with a more elegant solution.)
-------EDITS----------
to sum up, is there an existing library such that the following code would work?
import unstable_library
import object_inspecter.search
vk = unstable_library.create() #initializing vk
object_inspecter.search(vk,"print") #searching for lost method
Oy, I feel for you... You probably should just lock down on a version, use it until the API becomes stable, and then upgrade later. That's why complex projects focus on dependency version control.
There is a way, but it's not really standard. The dir() built-in function returns a list of string names of all attributes and methods of a class. With a little bit a wrangling, you could write a script that recursively digs down.
The ultimate issue, though, is that you're going to run into infinite recursive loops when you try to investigate class members. You'll need to add in smarts to either recognize the pattern or limit the depth of your searching.
you can use fairly simple graph traversal to do this
def find_thing(ob,pattern,seen=set(),name=None,curDepth=0,maxDepth=99,isDict=True):
if(ob is None):
return []
if id(ob) in seen:
return []
seen.add(id(ob))
name = str(name or getattr(ob,"__name__",str(ob)))
base_case = check_base_cases(ob,name,pattern)
if base_case is not None:
return base_case
if(curDepth>=maxDepth):
return []
return recursive_step(ob,pattern,name=name,curDepth=curDepth,isDict=isDict)
now just define your two steps (base_case and recursive_step)... something like
def check_base_cases(ob,name,pattern):
if isinstance(ob,str):
if re.match(pattern,ob):
return [ob]
else:
return []
if isinstance(ob,(int,float,long)):
if ob == pattern or str(ob) == pattern:
return [ob]
else:
return []
if isinstance(ob,types.FunctionType):
if re.match(pattern,name):
return [name]
else:
return []
def recursive_step(ob,pattern,name,curDepth,isDict=True):
matches = []
if isinstance(ob,(list,tuple)):
for i,sub_ob in enumerate(ob):
matches.extend(find_thing(sub_ob,pattern,name='%s[%s]'%(name,i),curDepth=curDepth+1))
return matches
if isinstance(ob,dict):
for key,item in ob.items():
if re.match(pattern,str(key)):
matches.append('%s.%s'%(name,key) if not isDict else '%s["%s"]'%(name,key))
else:
matches.extend(find_thing(item,pattern,
name='%s["%s"]'%(name,key) if isDict else '%s.%s'%(name,key),
curDepth=curDepth+1))
return matches
else:
data = dict([(x,getattr(ob,x)) for x in dir(ob) if not x.startswith("_")])
return find_thing(data,pattern,name=name,curDepth=curDepth+1,isDict=False)
finally you can test it like so
print(find_thing(vk,".*print.*"))
I used the following Example
class vk(object):
class enabledata:
class camera:
class object2:
def printX(*args):
pass
asynccamera = [object2]
tx = [camera]
toplink = {'enabledata':enabledata}
running the following
print( find_thing(vk,'.*print.*') )
#['vk.toplink["enabledata"].camera.asynccamera[0].printX']

Exporting duplicated code from inside a yielding generator function

Observe the following method:
def _locate(self, text):
"""
This method accesses preceding locators if these exist, it then calls an overridable helper method called _relocate
which receives text with readjusted boundaries and searches inside, the basic implemented behaviour is that of a logical or
"""
if not self.precedents:
for sub_segment in self._relocate(text, Segment(0, len(text), 1)):
if self._multiple:
yield sub_segment
elif self.max_segment.prob > self._prob_threshold:
yield self.max_segment
return
else:
for precedent in self.precedents:
for segment in precedent.locate(text):
for sub_segment in self._relocate(text, segment):
if self._multiple:
yield sub_segment
elif self.max_segment.prob > self._prob_threshold:
yield self.max_segment
return
# if we haven't found a good enough segment return the best one we came across while locating
if not self._multiple:
yield self.max_segment
it has some code which is duplicated twice:
for sub_segment in self._relocate(text, segment):
if self._multiple:
yield sub_segment
elif self.max_segment.prob > self._prob_threshold:
yield self.max_segment
return
I naively thought I could probably define a single helper method and have the code just once so started to implement it, however, this proved next to impossible (because of the fact that the code uses both yields and returns) and caused me much more pain in terms of code length and run-time that it was worth.
Not sure what I'm asking exactly (if anything perhaps I'm asking if anyone has any idea of some general approach to sharing generator code that yields or else sees how this can be done here?), but in any case as the topic of generators go I found this experience quite telling and interesting so I thought I'd share.
I think you can remove the code duplication by defining a generator of segments outside the loop
def _locate(self, text):
"""
This method accesses preceding locators if these exist, it then calls an overridable helper method called _relocate
which receives text with readjusted boundaries and searches inside, the basic implemented behaviour is that of a logical or
"""
if self.precedents:
segments = (seg for precedent in self.precedents for seg in precedent.locate(text))
else:
segments = (Segment(0, len(text), 1),)
for segment in segments:
for sub_segment in self._relocate(text, segment):
if self._multiple:
yield sub_segment
elif self.max_segment.prob > self._prob_threshold:
yield self.max_segment
return
# if we haven't found a good enough segment return the best one we came across while trying
if not self._multiple:
yield self.max_segment

Python understanding method for hashing class

This was meant to be the second part of my previous question but I decided to ask them as separate questions. I'm following the following code implementing a hashtable from the MIT lecture notes/videos. The lecturer does not explain his code so I can't get the answer to the questions from the video. I'm new to OOP and i would like to fully understand this particular method. Here is the code that is implemented:
class intSet(object):
#An intSet is a set of integers
def __init__(self):
"""Create an empty set of integers"""
self.numBuckets = 47
self.vals = []
for i in range(self.numBuckets):
self.vals.append([])
def hashE(self, e):
#Private function, should not be used outside of class
return abs(e)%len(self.vals)
def insert(self, e):
"""Assumes e is an integer and inserts e into self"""
for i in self.vals[self.hashE(e)]:
if i == e: return
self.vals[self.hashE(e)].append(e)
def member(self, e):
"""Assumes e is an integer
Returns True if e is in self, and False otherwise"""
return e in self.vals[self.hashE(e)]
def __str__(self):
"""Returns a string representation of self"""
elems = []
for bucket in self.vals:
for e in bucket: elems.append(e)
elems.sort()
result = ''
for e in elems: result = result + str(e) + ','
return '{' + result[:-1] + '}'
I do not understand why the method insert(self,e) works. Here is my understanding.
The value e is only appended if the return statement is executed, and this depends on the if statement if i==e. I believe, since initially self.vals is just a list of empty lists this if statement will never be true and thus nothing will be returned. However in the video lecturer the code works fine. Why is this the case?
Am I reading the code wrong with the indentations? I am new to Python so is it the case that perhaps if i==e is true the method returns nothing, otherwise it skips to the last line and appends the value, thus ensuring an element is not added twice? I appreciate any help, thanks!!
"[I]s it the case that perhaps if i==e is true the method returns
nothing, otherwise it skips to the last line and appends the value,
thus ensuring an element is not added twice?"
Yes, this is what happens. You figured it out! Good job. You said you're new, so a suggestion for the future is to start up Python in interactive mode, which you can do by just running python in your command line. In there you can test simple code (such as this insert function) and see its behavior, which might help you understand what's going on in a function all by yourself.
Here is a tutorial I found on interactive mode if you're interested.
if i==e: return is a conditional, but it only takes up one line, in which case an indent is not required.

Twisted inlineCallbacks and remote generators

I have used defer.inlineCallbacks in my code as I find it much easier to read and debug than using addCallbacks.
I am using PB and I have hit a problem when returning data to the client. The data is about 18Mb in size and I get a failed BananaError because of the length of the string being returned.
What I want to do is to write a generator so I can just keep calling the function and return some of the data each time the function is called.
How would I write that with inlineCallbacks already being used? Is it actually possible, If i return a value instead. Would something like the following work?
#defer.inlineCallbacks
def getLatestVersions(self):
returnlist = []
try:
latest_versions = yield self.cur.runQuery("""SELECT id, filename,path,attributes ,MAX(version) ,deleted ,snapshot , modified, size, hash,
chunk_table, added, isDir, isSymlink, enchash from files group by filename, path""")
except:
logger.exception("problem querying latest versions")
for result in latest_versions:
returnlist.append(result)
if len(return_list) >= 10:
yield return_list
returnlist = []
yield returnlist
A generator function decorated with inlineCallbacks returns a Deferred - not a generator. This is always the case. You can never return a generator from a function decorated with inlineCallbacks.
See the pager classes in twisted.spread.util for ideas about another approach you can take.

Python loop | "do-while" over a tree

Is there a more Pythonic way to put this loop together?:
while True:
children = tree.getChildren()
if not children:
break
tree = children[0]
UPDATE:
I think this syntax is probably what I'm going to go with:
while tree.getChildren():
tree = tree.getChildren()[0]
children = tree.getChildren()
while children:
tree = children[0]
children = tree.getChildren()
It would be easier to suggest something if I knew what kind of collection api you're working with. In a good api, you could probably do something like
while tree.hasChildren():
children = tree.getChildren()
tree = children[0]
(My first answer suggested to use iter(tree.getChildren, None) directly, but that won't work as we are not calling the same tree.getChildren function all the time.)
To fix this up I propose a solution using lambda's non-binding of its variables as a possible workaround. I think at this point this solution is not better than any other previously posted:
You can use iter() in it's second sentinel form, using lamda's strange binding:
for children in iter((lambda : tree.getChildren()), None):
tree = children[0]
(Here it assumes getChildren() returns None when there are no children, but it has to be replaced with whatever value it returns ([]?).)
iter(function, sentinel) calls function repeatedly until it returns the sentinel value.
Do you really only want the first branch? I'm gonna assume you don't and that you want the whole tree. First I'd do this:
def allitems(tree):
for child in tree.getChildren():
yield child
for grandchild in allitems(child):
yield grandchild
This will go through the whole tree. Then you can just:
for item in allitems(tree):
do_whatever_you_want(item)
Pythonic, simple, clean, and since it uses generators, will not use much memory even for huge trees.
I think the code you have is fine. If you really wanted to, you could wrap it all up in a try/except:
while True:
try:
tree = tree.getChildren()[0]
except (IndexError, TypeError):
break
IndexError will work if getChildren() returns an empty list when there are no children. If it returns False or 0 or None or some other unsubscriptable false-like value, TypeError will handle the exception.
But that's just another way to do it. Again, I don't think the Pythonistas will hunt you down for the code you already have.
Without further testing, I believe this should work:
try: while True: tree=tree.getChildren()[0]
except: pass
You might also want to override the __getitem__() (the brackets operator) in the Tree class, for further neatification.
try: while True: tree=tree[0]
except: pass

Categories

Resources