How to write Python generator function that never yields anything - python

I want to write a Python generator function that never actually yields anything. Basically it's a "do-nothing" drop-in that can be used by other code which expects to call a generator (but doesn't always need results from it). So far I have this:
def empty_generator():
# ... do some stuff, but don't yield anything
if False:
yield
Now, this works OK, but I'm wondering if there's a more expressive way to say the same thing, that is, declare a function to be a generator even if it never yields any value. The trick I've employed above is to show Python a yield statement inside my function, even though it is unreachable.

Another way is
def empty_generator():
return
yield
Not really "more expressive", but shorter. :)
Note that iter([]) or simply [] will do as well.

An even shorter solution:
def empty_generator():
yield from []

For maximum readability and maintainability, I would prioritize a construct which goes at the top of the function. So either
your original if False: yield construct, but hoisted to the very first line, or
a separate decorator which adds generator behavior to a non-generator callable.
(That's assuming you didn't just need a callable which did something and then returned an empty iterable/iterator. If so then you could just use a regular function and return ()/return iter(()) at the end.)
Imagine the reader of your code sees:
def name_fitting_what_the_function_does():
# We need this function to be an empty generator:
if False: yield
# that crucial stuff that this function exists to do
Having this at the top immediately cues in every reader of this function to this detail, which affects the whole function - affects the expectations and interpretations of this function's behavior and usage.
How long is your function body? More than a couple lines? Then as a reader, I will feel righteous fury and condemnation towards the author if I don't get a cue that this function is a generator until the very end, because I will probably have spent significant mental cost weaving a model in my head based on the assumption that this is a regular function - the first yield in a generator should ideally be immediately visible, when you don't even know to look for it.
Also, in a function longer than a few lines, a construct at the very beginning of the function is more trustworthy - I can trust that anyone who has looked at a function has probably seen its first line every time they looked at it. That means a higher chance that if that line was mistaken or broken, someone would have spotted it. That means I can be less vigilant for the possibility that this whole thing is actually broken but being used in a way that makes the breakage non-obvious.
If you're working with people who are sufficiently fluently familiar with the workings of Python, you could even leave off that comment, because to someone who immediately remembers that yield is what makes Python turn a function into a generator, it is obvious that this is the effect, and probably the intent since there is no other reason for correct code to have a non-executed yield.
Alternatively, you could go the decorator route:
#generator_that_yields_nothing
def name_fitting_what_the_function_does():
# that crucial stuff for which this exists
def generator_that_yields_nothing(wrapped):
#functools.wraps(wrapped)
def wrapper_generator():
if False: yield
wrapped()
return wrapper_generator

Related

Advantages of Higher order functions in Python

To implement prettified xml, I have written following code
def prettify_by_response(response, prettify_func):
root = ET.fromstring(response.content)
return prettify_func(root)
def prettify_by_str(xml_str, prettify_func):
root = ET.fromstring(xml_str)
return prettify_func(root)
def make_pretty_xml(root):
rough_string = ET.tostring(root, "utf-8")
reparsed = minidom.parseString(rough_string)
xml = reparsed.toprettyxml(indent="\t")
return xml
def prettify(response):
if isinstance(response, str) or isinstance(response, bytes):
return prettify_by_str(response, make_pretty_xml)
else:
return prettify_by_response(response, make_pretty_xml)
In prettify_by_response and prettify_by_str functions, I pass function make_pretty_xml as an argument
Instead of passing function as an argument, I can simply call that function.e.g
def prettify_by_str(xml_str, prettify_func):
root = ET.fromstring(xml_str)
return make_pretty_xml(root)
One of the advantage that passing function as an argument to these function over calling that function directly is, this function is not tightly couple to make_pretty_xml function.
What would be other advantages or Am I adding additional complexity?
This seem very open to biased answers I'll try to be impartial but I can't make any promise.
First, high order functions are functions that receive, and/or return functions. The advantages are questionable, I'll try to enumerate the usage of HoF and elucidate the goods and bads of each one
Callbacks
Callbacks came as a solution to blocking calls. I need B to happens after A so I call something that blocks on A and then calls B. This naturally leads to questions like, Hmm, my system wastes a lot of time waiting for things to happen. What if instead of waiting I can get what I need to be done passed as an argument. As anything new in technology that wasn't scaled yet seems a good idea until is scaled.
Callbacks are very common on the event system. If you every code in javascript you know what I'm talking about.
Algorithm abstraction
Some designs, mostly the behavioral ones can make use of HoF to choose some algorithm at runtime. You can have a high-level algorithm that receives functions that deal with low-level stuff. This lead to more abstraction code reuse and portable code. Here, portable means that you can write code to deal with new low levels without changing the high-level ones. This is not related to HoF but can make use of them for great help.
Attaching behavior to another function
The idea here is taking a function as an argument and returning a function that does exactly what the argument function does, plus, some attached behavior. And this is where (I think) HoF really shines.
Python decorators are a perfect example. They take a function as an argument and return another function. This function is attached to the same identifier of the first function
#foo
def bar(*args):
...
is the same of
def bar(*args):
...
bar = foo(bar)
Now, reflect on this code
from functools import lru_cache
#lru_cache(maxsize=None)
def fib(n):
if n < 2:
return n
return fib(n-1) + fib(n-2)
fib is just a Fibonacci function. It calculates the Fibonacci number up to n. Now lru_cache attach a new behavior, of caching results for already previously calculated values. The logic inside fib function is not tainted by LRU cache logic. What a beautiful piece of abstraction we have here.
Applicative style programming or point-free programming
The idea here is to remove variables, or points and combining function applications to express algorithms. I'm sure there are lots of people better than me in this subject wandering SO.
As a side note, this is not a very common style in python.
for i in it:
func(i)
from functools import partial
mapped_it = map(func, it)
In the second example, we removed the i variable. This is common in the parsing world. As another side node, map function is lazy in python, so the second example doesn't have effect until if you iterate over mapped_it
Your case
In your case, you are returning the value of the callback call. In fact, you don't need the callback, you can simply line up the calls as you did, and for this case you don't need HoF.
I hope this helps, and that somebody can show better examples of applicative style :)
Regards

Using a method to create multiple, new instances of a generator

I've been learning about generators in Python recently and have a question. I've used iterators before when learning Java, so I know how they basically work.
So I understand what's going on here in this question: Python for loop and iterator behavior
Essentially, once the for loop traverses through the iterator, it stops there, so doing another for loop would continue the iterator at the end of it (and result in nothing being printed out). That is clear to me
I'm also aware of the tee method from itertools, which lets me "duplicate" a generator. I found this to be helpful when I want to check if a generator is empty before doing anything to it (as I can check whether the duplicate in list form is empty).
In a code I'm writing, I need to create many of the same generators at different instances throughout the code, so my line of thought was: why don't I write a method that makes a generator? So every time I need a new one, I can call that method. Maybe my misunderstanding has to do with this "generator creation" process but that seemed right to me.
Here is the code I'm using. When I first call the method and duplicate it using tee, everything works fine, but then once I call it again after looping through it, the method returns an empty generator. Does this "using a method" workaround not work?
node_list=[]
generate_hash_2, temp= tee(generate_nodes(...))
for node in list(temp):
node_list.append(...)
print("Generate_hash_2:{}".format(node_list))
for node in generate_hash_2:
if node.hash_value==x:
print x
node_list2=[]
generate_hash_3, temp2= tee(generate_nodes(...)) #exact same parameters as before
for node in list(temp2):
node_list2.append(...)
print("Generate_hash_3:{}".format(node_list2))
`
def generate_nodes(nodes, type):
for node in nodes:
if isinstanceof(node.type,type):
yield node
Please ignore the poor variable name choices but the temp2 prints out fine, but temp3 prints out an empty list, despite the methods taking identical parameters :( Note that the inside of the for loop doesn't modify any of the items or anything. Any help or explanation would be great!
For a sample XML file, I have this:
<top></top>
For a sample output, I'm getting:
Generate_hash_2:["XPath:/*[name()='top'][1], Node Type:UnknownNode, Tag:top, Text:"]
Generate_hash_3:[]
If you are interested in helping me understand this further, I've been writing these methods to get an understanding of the files in here: https://github.com/mmoosstt/XmlXdiff/tree/master/lib/diffx , specifically the differ.py file
The code in that file constantly calls the _gen_dx_nodes() method (with the same parameters), which is a method that creates a generator. But the code's generator never "ends" and forces the writer to do something to reset it. So I'm confused why this happens to me (because I've been running into my problem when calling that method from different methods in succession). I've also been using the same test cases so I'm pretty lost here on how to fix this. Any help would be great!

Use of a temporary variable vs repeatedly read same key/value from a dictionary

Background: I need to read the same key/value from a dictionary (exactly) twice.
Question: There are two ways, as shown below,
Method 1. Read it with the same key twice, e.g.,
sample_map = {'A':1,}
...
if sample_map.get('A', None) is not None:
print("A's value in map is {}".format(sample_map.get('A')))
Method 2. Read it once and store it in a local variable, e.g,
sample_map = {'A':1,}
...
ret_val = sample.get('A', None)
if ret_val is not None:
print("A's value in map is {}".format(ret_val))
Which way is better? What are their Pros and Cons?
Note that I am aware that print() can naturally handle ret_val of None. This is a hypothetical example and I just use it for illustration purposes.
Under these conditions, I wouldn't use either. What you're really interested in is whether A is a valid key, and the KeyError (or lack thereof) raised by __getitem__ will tell you if it is or not.
try:
print("A's value in map is {}".format(sample['A'])
except KeyError:
pass
Or course, some would say there is too much code in the try block, in which case method 2 would be preferable.
try:
ret_val = sample['A']
except KeyError:
pass
else:
print("A's value in map is {}".format(ret_val))
or the code you already have:
ret_val = sample.get('A') # None is the default value for the second argument
if ret_val is not None:
print("A's value in map is {}".format(ret_val))
There isn't any effective difference between the options you posted.
Python: List vs Dict for look up table
Lookups in a dict are about o(1). Same goes for a variable you have stored.
Efficiency is about the same. In this case, I would skip defining the extra variable, since not much else is going on.
But in a case like below, where there's a lot of dict lookups going on, I have plans to refactor the code to make things more intelligible, as all of the lookups clutter or obfuscate the logic:
# At this point, assuming that these are floats is OK, since no thresholds had text values
if vname in paramRanges:
"""
Making sure the variable is one that we have a threshold for
"""
# We might care about it
# Don't check the equal case, because that won't matter
if float(tblChanges[vname][0]) < float(tblChanges[vname][1]):
# Check lower tolerance
# Distinction is important because tolerances are not always the same +/-
if abs(float(tblChanges[vname][0]) - float(tblChanges[vname][1])) >= float(
paramRanges[vname][2]):
# Difference from default is greater than tolerance
# vname : current value, default value, negative tolerance, tolerance units, change date
alerts[vname] = (
float(tblChanges[vname][0]), float(tblChanges[vname][1]), float(paramRanges[vname][2]),
paramRanges[vname][0], tblChanges[vname][2]
)
if abs(float(tblChanges[vname][0]) - float(tblChanges[vname][1])) >= float(
paramRanges[vname][1]):
alerts[vname] = (
float(tblChanges[vname][0]), float(tblChanges[vname][1]), float(paramRanges[vname][1]),
paramRanges[vname][0], tblChanges[vname][2]
)
In most cases—if you can't just rewrite your code to use EAFP as chepner suggests, which you probably can for this example—you want to avoid repeated method calls.
The only real benefit of repeating the get is saving an assignment statement.
If your code isn't crammed in the middle of a complex expression, that just means saving one line of vertical space—which isn't nothing, but isn't a huge deal.
If your code is crammed in the middle of a complex expression, pulling the get out may force you to rewrite things a bit. You may have to, e.g., turn a lambda into a def, or turn a while loop with a simple condition into a while True: with an if …: break. Usually that's a sign that you, e.g., really wanted a def in the first place, but "usually" isn't "always". So, this is where you might want to violate the rule of thumb—but see the section at the bottom first.
On the other side…
For dict.get, the performance cost of repeating the method is pretty tiny, and unlikely to impact your code. But what if you change the code to take an arbitrary mapping object from the caller, and someone passes you, say, a proxy that does a get by making a database query or an RPC to a remote server?
For single-threaded code, calling dict.get with the same arguments twice in a row without doing anything in between is correct. But what if you're taking a dict passed by the caller, and the caller has a background thread also modifying the same dict? Then your code is only correct if you put a Lock or other synchronization around the two accesses.
Or, what if your expression was something that might mutate some state, or do something dangerous?
Even if nothing like this is ever going to be an issue in your code, unless that fact is blindingly obvious to anyone reading your code, they're still going to have to think about the possibility of performance costs and ToCToU races and so on.
And, of course, it makes at least two of your lines longer. Assuming you're trying to write readable code that sticks to 72 or 79 or 99 columns, horizontal space is a scarce resource, while vertical space is much less of a big deal. I think your second version is easier to scan than your first, even without all of these other considerations, but imagine making the expression, say, 20 characters longer.
In the rare cases where pulling the repeated value out of an expression would be a problem, you still often want to assign it to a temporary.
Unfortunately, up to Python 3.7, you usually can't. It's either clumsy (e.g., requiring an extra nested comprehension or lambda just to give you an opportunity to bind a variable) or impossible.
But in Python 3.8, PEP 572 assignment expressions handle this case.
if (sample := sample_map.get('A', None)) is not None:
print("A's value in map is {}".format(sample))
I don't think this is a great use of an assignment expression (see the PEP for some better examples), especially since I'd probably write this the way chepner suggested… but it does show how to get the best of both worlds (assigning a temporary, and being embeddable in an expression) when you really need to.

Python generators and coroutines

I am studying coroutines and generators in various programming languages.
I was wondering if there is a cleaner way to combine together two coroutines implemented via generators than yielding back at the caller whatever the callee yields?
Let's say that we are using the following convention: all yields apart from the last one return null, while the last one returns the result of the coroutine. So, for example, we could have a coroutine that invokes another:
def A():
# yield until a certain condition is met
yield result
def B():
# do something that may or may not yield
x = bind(A())
# ...
return result
in this case I wish that through bind (which may or may not be implementable, that's the question) the coroutine B yields whenever A yields until A returns its final result, which is then assigned to x allowing B to continue.
I suspect that the actual code should explicitly iterate A so:
def B():
# do something that may or may not yield
for x in A(): ()
# ...
return result
which is a tad ugly and error prone...
PS: it's for a game where the users of the language will be the designers who write scripts (script = coroutine). Each character has an associated script, and there are many sub-scripts which are invoked by the main script; consider that, for example, run_ship invokes many times reach_closest_enemy, fight_with_closest_enemy, flee_to_allies, and so on. All these sub-scripts need to be invoked the way you describe above; for a developer this is not a problem, but for a designer the less code they have to write the better!
Edit: I recommend using Greenlet. But if you're interested in a pure Python approach, read on.
This is addressed in PEP 342, but it's somewhat tough to understand at first. I'll try to explain simply how it works.
First, let me sum up what I think is the problem you're really trying to solve.
Problem
You have a callstack of generator functions calling other generator functions. What you really want is to be able to yield from the generator at the top, and have the yield propagate all the way down the stack.
The problem is that Python does not (at a language level) support real coroutines, only generators. (But, they can be implemented.) Real coroutines allow you to halt an entire stack of function calls and switch to a different stack. Generators only allow you to halt a single function. If a generator f() wants to yield, the yield statement has to be in f(), not in another function that f() calls.
The solution that I think you're using now, is to do something like in Simon Stelling's answer (i.e. have f() call g() by yielding all of g()'s results). This is very verbose and ugly, and you're looking for syntax sugar to wrap up that pattern. Note that this essentially unwinds the stack every time you yield, and then winds it back up again afterwards.
Solution
There is a better way to solve this problem. You basically implement coroutines by running your generators on top of a "trampoline" system.
To make this work, you need to follow a couple patterns:
1. When you want to call another coroutine, yield it.
2. Instead of returning a value, yield it.
so
def f():
result = g()
# …
return return_value
becomes
def f():
result = yield g()
# …
yield return_value
Say you're in f(). The trampoline system called f(). When you yield a generator (say g()), the trampoline system calls g() on your behalf. Then when g() has finished yielding values, the trampoline system restarts f(). This means that you're not actually using the Python stack; the trampoline system manages a callstack instead.
When you yield something other than a generator, the trampoline system treats it as a return value. It passes that value back to the caller generator through the yield statement (using .send() method of generators).
Comments
This kind of system is extremely important and useful in asynchronous applications, like those using Tornado or Twisted. You can halt an entire callstack when it's blocked, go do something else, and then come back and continue execution of the first callstack where it left off.
The drawback of the above solution is that it requires you to write essentially all your functions as generators. It may be better to use an implementation of true coroutines for Python - see below.
Alternatives
There are several implementations of coroutines for Python, see: http://en.wikipedia.org/wiki/Coroutine#Implementations_for_Python
Greenlet is an excellent choice. It is a Python module that modifies the CPython interpreter to allow true coroutines by swapping out the callstack.
Python 3.3 should provide syntax for delegating to a subgenerator, see PEP 380.
Are you looking for something like this?
def B():
for x in A():
if x is None:
yield
else:
break
# continue, x contains value A yielded

Python method that is also a generator function?

I'm trying to build a method that also acts like a generator function, at a flip of a switch (want_gen below).
Something like:
def optimize(x, want_gen):
# ... declaration and validation code
for i in range(100):
# estimate foo, bar, baz
# ... some code here
x = calculate_next_x(x, foo, bar, baz)
if want_gen:
yield x
if not want_gen:
return x
But of course this doesn't work -- Python apparently doesn't allow yield and return in the same method, even though they cannot be executed simultaneously.
The code is quite involved, and refactoring the declaration and validation code doesn't make much sense (too many state variables -- I will end up with difficult-to-name helper routines of 7+ parameters, which is decidedly ugly). And of course, I'd like to avoid code duplication as much as possible.
Is there some code pattern that would make sense here to achieve the behaviour I want?
Why do I need that?
I have a rather complicated and time-consuming optimization routine, and I'd like to get feedback about its current state during runtime (to display in e.g. GUI). The old behaviour needs to be there for backwards compatibility. Multithreading and messaging is too much work for too little additional benefit, especially when cross-platform operation is necessary.
Edit:
Perhaps I should have mentioned that since each optimization step is rather lengthy (there are some numerical simulations involved as well), I'd like to be able to "step in" at a certain iteration and twiddle some parameters, or abort the whole business altogether. The generators seemed like a good idea, since I could launch another iteration at my discretion, fiddling in the meantime with some parameters.
Since all you seem to want is some sort of feedback for a long running function, why not just pass in a reference to a callback procedure that will be called at regular intervals?
An edit to my answer, why not just always yield? You can have a function which yields a single value. If you don't want that then just choose to have your function either return a generator itself or the value:
def stuff(x, want_gen):
if want_gen:
def my_gen(x):
#code with yield
return my_gen
else:
return x
That way you are always returning a value. In Python, functions are objects.
Well...we can always remember that yield was implemented in the language as a way to facilitate the existence of generator objects, but one can always implement them either from scratch, or getting the best of both worlds:
class Optimize(object):
def __init__(self, x):
self.x = x
def __iter__(self):
x = self.x
# ... declaration and validation code
for i in range(100):
# estimate foo, bar, baz
# ... some code here
x = calculate_next_x(x, foo, bar, baz)
yield x
def __call__(self):
gen = iter(self)
return gen.next()
def optimize(x, wantgen):
if wantgen:
return iter(Optimize(x))
else:
return Optimize(x)()
Not that you don't even need the "optimize" function wrapper - I just put it in there so it becomes a drop-in replacement for your example (would it work).
The way the class is declared, you can do simply:
for y in Optimize(x):
#code
to use it as a generator, or:
k = Optimize(x)()
to use it as a function.
Kind of messy, but I think this does the same as your original code was asking:
def optimize(x, want_gen):
def optimize_gen(x):
# ... declaration and validation code
for i in range(100):
# estimate foo, bar, baz
# ... some code here
x = calculate_next_x(x, foo, bar, baz)
if want_gen:
yield x
if want_gen:
return optimize_gen(x)
for x in optimize_gen(x):
pass
return x
Alternatively the for loop at the end could be written:
return list(optimize_gen(x))[-1]
Now ask yourself if you really want to do this. Why do you sometimes want the whole sequence and sometimes only want the last element? Smells a bit fishy to me.
It's not completely clear what you want to happen if you switch between generator and function modes.
But as a first try: perhaps wrap the generator version in a new method which explicitly throws away the intermediate steps?
def gen():
for i in range(100):
yield i
def wrap():
for x in gen():
pass
return x
print "wrap=", wrap()
With this version you could step into gen() by looping over smaller numbers of the range, make adjustments, and then use wrap() only when you want to finish up.
Simplest is to write two methods, one the generator and the other calling the generator and just returning the value. If you really want one function with both possibilities, you can always use the want_gen flag to test what sort of return value, returning the iterator produced by the generator function when True and just the value otherwise.
How about this pattern. Make your 3 line of changes to convert the function to a generator. Rename it to NewFunctionName. Replace the existing function with one that either returns the generator if want_gen is True, or exhausts the generator and returns the final value.

Categories

Resources