Python loop | "do-while" over a tree - python

Is there a more Pythonic way to put this loop together?:
while True:
children = tree.getChildren()
if not children:
break
tree = children[0]
UPDATE:
I think this syntax is probably what I'm going to go with:
while tree.getChildren():
tree = tree.getChildren()[0]

children = tree.getChildren()
while children:
tree = children[0]
children = tree.getChildren()
It would be easier to suggest something if I knew what kind of collection api you're working with. In a good api, you could probably do something like
while tree.hasChildren():
children = tree.getChildren()
tree = children[0]

(My first answer suggested to use iter(tree.getChildren, None) directly, but that won't work as we are not calling the same tree.getChildren function all the time.)
To fix this up I propose a solution using lambda's non-binding of its variables as a possible workaround. I think at this point this solution is not better than any other previously posted:
You can use iter() in it's second sentinel form, using lamda's strange binding:
for children in iter((lambda : tree.getChildren()), None):
tree = children[0]
(Here it assumes getChildren() returns None when there are no children, but it has to be replaced with whatever value it returns ([]?).)
iter(function, sentinel) calls function repeatedly until it returns the sentinel value.

Do you really only want the first branch? I'm gonna assume you don't and that you want the whole tree. First I'd do this:
def allitems(tree):
for child in tree.getChildren():
yield child
for grandchild in allitems(child):
yield grandchild
This will go through the whole tree. Then you can just:
for item in allitems(tree):
do_whatever_you_want(item)
Pythonic, simple, clean, and since it uses generators, will not use much memory even for huge trees.

I think the code you have is fine. If you really wanted to, you could wrap it all up in a try/except:
while True:
try:
tree = tree.getChildren()[0]
except (IndexError, TypeError):
break
IndexError will work if getChildren() returns an empty list when there are no children. If it returns False or 0 or None or some other unsubscriptable false-like value, TypeError will handle the exception.
But that's just another way to do it. Again, I don't think the Pythonistas will hunt you down for the code you already have.

Without further testing, I believe this should work:
try: while True: tree=tree.getChildren()[0]
except: pass
You might also want to override the __getitem__() (the brackets operator) in the Tree class, for further neatification.
try: while True: tree=tree[0]
except: pass

Related

Why doesn't python Queue have falsey behavior

In python, I have gotten quite used to container objects having truthy behavior when they are populated, and falsey behavior when they are not:
# list
a = []
not a
True
a.append(1)
not a
False
# deque
from collections import deque
d = deque()
not d
True
d.append(1)
not d
False
# and so on
However, queue.Queue does not have this behavior. To me, this seems odd and a contradiction against almost any other container data type that I can think of. Furthermore, the method empty on queue seem to go against coding conventions that avoid race conditions on any other object (checking if a file exists, checking if a list is empty, etc). For example, we would generally say the following is bad practice:
_queue = []
if not len(_queue):
# do something
And should be replaced with
_queue = []
if not _queue:
# do something
or to handle an IndexError, which we might still argue would be better with the if not _queue statement:
try:
x = _queue.pop()
except IndexError as e:
logger.exception(e)
# do something else
Yet, Queue requires someone to do one of the following:
_queue = queue.Queue()
if _queue.empty():
# do something
# though this smells like a race condition
# or handle an exception
try:
_queue.get(timeout=5)
except Empty as e:
# do something else
# maybe logger.exception(e)
Is there documentation somewhere that might point to why this design choice was made? It seems odd, especially when the source code shows that it was built on top of collections.deque (noted that Queue does not inherit from deque)
According to the definition of the truth value testing procedure, the behavior is expected:
Any object can be tested for truth value, for use in an if or while
condition or as operand of the Boolean operations below.
By default, an object is considered true unless its class defines
either a __bool__() method that returns False or a __len__() method
that returns zero, when called with the object.
As Queue does not neither implements __bool__() nor __len__() then it's truth value is True. As to why does Queue does not implement __len__() a clue can be found in the comments of the qsize function:
'''Return the approximate size of the queue (not reliable!).'''
The same can be said of the __bool__() function.
I'm going to leave the accepted answer as is, but as far as I can tell, the reason is that if _queue: # do something would be a race condition, since Queue is designed to be passed between threads and therefore possesses dubious state as far as tasks go.
From the source:
class Queue:
~snip~
def qsize(self):
'''Return the approximate size of the queue (not reliable!).'''
with self.mutex:
return self._qsize()
def empty(self):
'''Return True if the queue is empty, False otherwise (not reliable!).
This method is likely to be removed at some point. Use qsize() == 0
as a direct substitute, but be aware that either approach risks a race
condition where a queue can grow before the result of empty() or
qsize() can be used.
To create code that needs to wait for all queued tasks to be
completed, the preferred technique is to use the join() method.
'''
with self.mutex:
return not self._qsize()
~snip
Must have missed this helpful docstring when I was originally looking. The qsize bool is not tied to the state of the queue once it's evaluated. So the user is doing processing against a queue based on an already out-of-date state.
Like checking the existence of a file, it's more pythonic to just handle the exception:
try:
task = _queue.get(timeout=4)
except Empty as e:
# do something
since the exception/success against get is the state of the queue.
Likewise, we would not do:
if os.exists(file):
with open(file) as fh:
# do processing
Instead, we would do:
try:
with open(file) as fh:
# do processing
except FileNotFoundError as e:
# do something else
I suppose the intentional leaving-out of the __bool__ method by the author is to steer the developer away from leaning against such a paradigm, and treating the queue like you would any other object that might be of questionable state.

python recursion not working with yield

I made a tree data structure and a function which gives out all its leaves, but the recursive algorithm never seems to work for any of the child nodes. The function gets called once using the root node
def get_files(self, initials):
for child in self.children:
name = initials + os.sep + child.name
if child.children == []:
yield name
else:
child.get_files(name)
full class: https://pastebin.com/4eukaVWx
if child.children == []:
yield name
else:
child.get_files(name)
Here you're yielding only in the if. In the other branch, the data is lost. You need to yield the elements returned by child.get_files(name). I'd do:
if not child.children:
yield name
else:
yield from child.get_files(name)
yield from is available in "recent" python versions. An alternative for older versions is a loop:
for item in child.get_files(name):
yield item
(a similar issue happens a lot with functions: Why does my function return None?)
Not a solution but an observation:
I guess you are printing something in the pastebin code and you trimmed down the print statement just to post a mve on the question. It works completely fine without the print statements but as soon as you put a single print statement in the method, the recursion stops happening.

Avoid extra line for attribute check?

I am developing this Python project where I encounter a situation many times and I wondered if there is a better way.
There is a list of class instances. Some part of lists are empty(filled with None).
Here is an example list.
ins_list = [ins_1, ins_2, None, ins_3, None]
I have to do some confirmations throughout the program flow. There are points where I need the control an attribute of these instances. But only indexes are given for choosing an instance from the list and it may be one of the empty elements. Which would give an error when the attribute is called. Here is an example program flow.
ind = 2
if ins_list[ind].some_attribute == "thing":
# This would give error when empty element is selected.
I deal with this by using,
if ins_list[ind]:
if ins_list[ind].some_attribute == "thing":
# This works
I am okay with using this. However the program is a long one, I apply this hundreds of times. Is there an easier, better way of doing this, it means I am producing reduntant code and increasing indentation level for no reason. I wish to know if there is such a solution.
Use a boolean operator and.
if ins_list[ind] and ins_list[ind].some_attribute == "thing":
# Code
As coder proposed, you can remove None from your list, or use dictionaries instead, to avoid to have to create an entry for each index.
I want to propose another way: you can create a dummyclass and replace None by it. This way there will be no error if you set an attribute:
class dummy:
def __nonzero__(self):
return False
def __setattr__(self, k, v):
return
mydummy = dummy()
mylist = [ins_1, ins_2, mydummy, ins_3, mydummy]
nothing will be stored to the dummyinstances when setting an attribute
edit:
If the content of the original list cannot be chosen, then this class could help:
class PickyList(list):
def __init__(self, iterable, dummyval):
self.dummy = dummyval
return super(PickyList, self).__init__(iterable)
def __getitem__(self, k):
v = super(PickyList, self).__getitem__(k)
return (self.dummy if v is None else v)
mylist = PickyList(ins_list, mydummy)
There are these two options:
Using a dictionary:
Another way would be to use a dictionary instead. So you could create your dictionary once the list is filled up with elements. The dictionary's keys would be the values of your list and as values you could use the attributes of the elements that are not None and "No_attr" for those that are None. (Note: Have in mind that python dictionaries don't support duplicate keys and that's why I propose below to store as keys your list indexes else you will have to find a way to make keys be different)
For example for a list like:
l = [item1,item2,None,item4]
You could create a dictionary:
d = {item1:"thing1", item2:"thing2", None:"No_attr", item3:"thing3"}
So in this way every time you would need to make a check, you wouldn't have to check two conditions, but you could check only the value, such as:
if d.values()[your_index]=="thing":
The only cons of this method is that standard python dictionaries are inherently unordered, which makes accessing dictionary values by index a bit dangerous sometimes - you have to be careful not to change the form-arrangement of the dictionary.
Now, if you want to make sure that the index stays stable, then you would have to store it some way, for example select as keys of your dictionary the indexes, as you will have already stored the attributes of the items - But that is something that you will have to decide and depends strongly on the architecture of your project.
Using a list:
In using lists way I don't think there is a way to avoid your if statement - and is not bad actually. Maybe use an and operator as it is mentioned already in another answer but I don't think that makes any difference anyway.
Also, if you want to use your first approach:
if ins_list[ind].some_attribute == "thing":
You could try using and exception catcher like this:
try:
if ins_list[ind].some_attribute == "thing":
#do something
except:
#an error occured
pass
In this case I would use an try-except statement because of EAFP (easier to ask for forgivness than permission). It won't shorten yout code but it's a more Pythonic way to code when checking for valid attributes. This way you won't break against DRY (Don't Repat Yourself) either.
try:
if ins_list[ind].some_attribute == "thing":
# do_something()
except AttributeError:
# do_something_else()

How to make a function return nothing?

I have a function called crawl which will return a link to a website. Then I do something like:
found.append(crawl()) (found is a list)
This works fine as long as crawl returns a valid link, but sometimes it does not return anything. So a value of None gets added to the list.
So my question is that, is it possible to return something from crawl that will not not add anything to the list?
In Python nothing is something: None. You have to use if-condition:
link = crawl()
if link is not None:
found.append(link)
or let crawl return a list, which can contain one or zero elements:
found.extend(crawl())
What you could do is to pass the list in to the crawl function and if there is anything to add append it otherwise not. Something like:
def crawl(found):
""" Add results to found list """
# do your stuff saving to result
if result is not None:
found.append(result)
# Call this as
crawl(found)
This is not possible directly.
You can test for a None being returned though, using the following code.
returned_link = crawl()
if returned_link is not None:
found.append(returned_link)
If a function returns it has to return an object, even if that object is None. However, there is another answer being overlooked, and that is to raise an exception rather than returning None.
As other people point out, one approach is to check if the returned object is None before appending it to the list:
link = crawl()
if link is not None:
found.append(link)
The other approach is to define some exception, perhaps WebsiteNotFoundError, and have crawl execute raise WebsiteNotFoundError instead of return None. Then you can write:
try:
found.append(crawl())
except WebsiteNotFoundError:
pass # or take appropriate action
The advantage of the exception handling approach is that it is generally faster than checking for None if returning None is a relatively rare occurrence compared to returning a valid link. Depending on the use, it may be more readable in the sense that the code naturally explains what is going wrong with the function.

Workarounds to suspend (serialize) and resume a recursive generator stack?

I have a recursive generator function that creates a tree of ChainMap contexts, and finally does something with the context at the end of the tree. It looks like this (parent_context is a ChainMap, hierarchy is a list):
def recursive_generator(parent_context, hierarchy):
next_level = hierarchy[0]
next_level_contexts = get_contexts(next_level) # returns a list of dicts
for context in next_level_contexts:
child_context = parent_context.new_child().update(context)
if next_level == hierarchy[-1]:
yield do_something(**child_context)
else:
yield from recursive_generator(child_context, hierarchy[1:])
Now I'd like to flag one level of the hierarchy such that the operation suspends after finishing that level, serializes the state to disk to be picked up later where it left off. Is there a way to do this without losing the elegance of the recursion?
I know that you can't pickle generators, so I thought about refactoring into an iterator object. But I think yield from is necessary for the recursion here (edit: at least without some tedious management of the stack), so I think it needs to be a generator, no? Is there a workaround for this?
you seem to be exploring a tree with DFS. so you could construct the tree in memory and make the DFS explicit. then just store the tree and restart at the left-most node (i think?).
that's effectively "tedious management of the stack", but it has a nice picture that would help implement it (at least for me, looking at your problem as DFS of a tree makes the implementation seem fairly obvious - before i thought of it like that, it seemed quite complicated - but i may be missing something).
sorry if that's obvious and insufficient...
[edit]
class Inner:
def __init__(self, context, hierarchy):
self.children = []
next_level = hierarchy[0]
next_level_contexts = get_contexts(next_level)
for context in next_level_contexts:
child_context = parent_context.new_child().update(context)
if next_level == hierarchy[-1]:
self.children.append(Leaf(context))
else:
self.children.append(Inner(child_context, hierarchy[1:]))
def do_something(self):
# this will do something on the left-most leaf
self.children[0].so_something()
def prune(self):
# this will remove the left-most leaf
if isinstance(self.children[0], Leaf):
self.children.pop(0)
else:
self.children[0].prune()
if not self.children[0]:
self.children.pop(0)
def __bool__(self):
return bool(self.children)
class Leaf:
def __init__(self, context):
self.context = context
def do_something():
do_something(**self.context)
the code above hasn't been tested. i ended up using classes for nodes as a tuple seemed too confusing. you create the tree by creating the parent node. then you can "do something" by calling do_something, after which you will want to remove the "done" leaf with prune:
tree = Inner(initial_context, initial_hierarchy)
while tree:
tree.do_something()
tree.prune()
i am pretty sure it will contain bugs, but hopefully it's enough to show the idea. sorry i can't do more but i need to repot plants....
ps it's amusing that you can write code with generators, but didn't know what DFS was. you might enjoy reading the "algorithm design manual" - it's part textbook and part reference, and it doesn't treat you like an idiot (i too have no formal education in computer science, and i thought it was a good book).
[edited to change to left-most first, which is what you had before, i think]
and alko has a good point...
Here's what I ended up doing:
def recursive_generator(parent_context, hierarchy):
next_level = hierarchy[0]
next_level_contexts = get_contexts(next_level) # returns a list of dicts
for context in next_level_contexts:
child_context = parent_context.new_child().update(context)
if next_level == hierarchy[-1]:
yield child_context
else:
yield from recursive_generator(child_context, hierarchy[1:])
def traverse_tree(hierarchy):
return list(recursive_generator(ChainMap(), hierarchy)
def do_things(contexts, start, stop):
for context in contexts[start:stop]:
yield do_something(**context)
Then I can pickle the list returned by traverse_tree and later load it and run it in pieces with do_things. This is all in a class with a lot more going on of course, but this gets to the gist of it.

Categories

Resources