Handle generator exceptions in its consumer - python

This is a follow-up to Handle an exception thrown in a generator and discusses a more general problem.
I have a function that reads data in different formats. All formats are line- or record-oriented and for each format there's a dedicated parsing function, implemented as a generator. So the main reading function gets an input and a generator, which reads its respective format from the input and delivers records back to the main function:
def read(stream, parsefunc):
for record in parsefunc(stream):
do_stuff(record)
where parsefunc is something like:
def parsefunc(stream):
while not eof(stream):
rec = read_record(stream)
do some stuff
yield rec
The problem I'm facing is that while parsefunc can throw an exception (e.g. when reading from a stream), it has no idea how to handle it. The function responsible for handling exceptions is the main read function. Note that exceptions occur on a per-record basis, so even if one record fails, the generator should continue its work and yield records back until the whole stream is exhausted.
In the previous question I tried to put next(parsefunc) in a try block, but as turned out, this is not going to work. So I have to add try-except to the parsefunc itself and then somehow deliver exceptions to the consumer:
def parsefunc(stream):
while not eof(stream):
try:
rec = read_record()
yield rec
except Exception as e:
?????
I'm rather reluctant to do this because
it makes no sense to use try in a function that isn't intended to handle any exceptions
it's unclear to me how to pass exceptions to the consuming function
there going to be many formats and many parsefunc's, I don't want to clutter them with too much helper code.
Has anyone suggestions for a better architecture?
A note for googlers: in addition to the top answer, pay attention to senderle's and Jon's posts - very smart and insightful stuff.

You can return a tuple of record and exception in the parsefunc and let the consumer function decide what to do with the exception:
import random
def get_record(line):
num = random.randint(0, 3)
if num == 3:
raise Exception("3 means danger")
return line
def parsefunc(stream):
for line in stream:
try:
rec = get_record(line)
except Exception as e:
yield (None, e)
else:
yield (rec, None)
if __name__ == '__main__':
with open('temp.txt') as f:
for rec, e in parsefunc(f):
if e:
print "Got an exception %s" % e
else:
print "Got a record %s" % rec

Thinking deeper about what would happen in a more complex case kind of vindicates the Python choice of avoiding bubbling exceptions out of a generator.
If I got an I/O error from a stream object the odds of simply being able to recover and continue reading, without the structures local to the generator being reset in some way, would be low. I would somehow have to reconcile myself with the reading process in order to continue: skip garbage, push back partial data, reset some incomplete internal tracking structure, etc.
Only the generator has enough context to do that properly. Even if you could keep the generator context, having the outer block handle the exceptions would totally flout the Law of Demeter. All the important information that the surrounding block needs to reset and move on is in local variables of the generator function! And getting or passing that information, though possible, is disgusting.
The resulting exception would almost always be thrown after cleaning up, in which case the reader-generator will already have an internal exception block. Trying very hard to maintain this cleanliness in the brain-dead-simple case only to have it break down in almost every realistic context would be silly. So just have the try in the generator, you are going to need the body of the except block anyway, in any complex case.
It would be nice if exceptional conditions could look like exceptions, though, and not like return values. So I would add an intermediate adapter to allow for this: The generator would yield either data or exceptions and the adapter would re-raise the exception if applicable. The adapter should be called first-thing inside the for loop, so that we have the option of catching it within the loop and cleaning up to continue, or breaking out of the loop to catch it and and abandon the process. And we should put some kind of lame wrapper around the setup to indicate that tricks are afoot, and to force the adapter to get called if the function is adapting.
That way each layer is presented errors that it has the context to handle, at the expense of the adapter being a tiny bit intrusive (and perhaps also easy to forget).
So we would have:
def read(stream, parsefunc):
try:
for source in frozen(parsefunc(stream)):
try:
record = source.thaw()
do_stuff(record)
except Exception, e:
log_error(e)
if not is_recoverable(e):
raise
recover()
except Exception, e:
properly_give_up()
wrap_up()
(Where the two try blocks are optional.)
The adapter looks like:
class Frozen(object):
def __init__(self, item):
self.value = item
def thaw(self):
if isinstance(value, Exception):
raise value
return value
def frozen(generator):
for item in generator:
yield Frozen(item)
And parsefunc looks like:
def parsefunc(stream):
while not eof(stream):
try:
rec = read_record(stream)
do_some_stuff()
yield rec
except Exception, e:
properly_skip_record_or_prepare_retry()
yield e
To make it harder to forget the adapter, we could also change frozen from a function to a decorator on parsefunc.
def frozen_results(func):
def freezer(__func = func, *args, **kw):
for item in __func(*args, **kw):
yield Frozen(item)
return freezer
In which case we we would declare:
#frozen_results
def parsefunc(stream):
...
And we would obviously not bother to declare frozen, or wrap it around the call to parsefunc.

Without knowing more about the system, I think it's difficult to tell what approach will work best. However, one option that no one has suggested yet would be to use a callback. Given that only read knows how to deal with exceptions, might something like this work?
def read(stream, parsefunc):
some_closure_data = {}
def error_callback_1(e):
manipulate(some_closure_data, e)
def error_callback_2(e):
transform(some_closure_data, e)
for record in parsefunc(stream, error_callback_1):
do_stuff(record)
Then, in parsefunc:
def parsefunc(stream, error_callback):
while not eof(stream):
try:
rec = read_record()
yield rec
except Exception as e:
error_callback(e)
I used a closure over a mutable local here; you could also define a class. Note also that you can access the traceback info via sys.exc_info() inside the callback.
Another interesting approach might be to use send. This would work a little differently; basically, instead of defining a callback, read could check the result of yield, do a lot of complex logic, and send a substitute value, which the generator would then re-yield (or do something else with). This is a bit more exotic, but I thought I'd mention it in case it's useful:
>>> def parsefunc(it):
... default = None
... for x in it:
... try:
... rec = float(x)
... except ValueError as e:
... default = yield e
... yield default
... else:
... yield rec
...
>>> parsed_values = parsefunc(['4', '6', '5', '5h', '22', '7'])
>>> for x in parsed_values:
... if isinstance(x, ValueError):
... x = parsed_values.send(0.0)
... print x
...
4.0
6.0
5.0
0.0
22.0
7.0
On it's own this is a bit useless ("Why not just print the default directly from read?" you might ask), but you could do more complex things with default inside the generator, resetting values, going back a step, and so on. You could even wait to send a callback at this point based on the error you receive. But note that sys.exc_info() is cleared as soon as the generator yields, so you'll have to send everything from sys.exc_info() if you need access to the traceback.
Here's an example of how you might combine the two options:
import string
digits = set(string.digits)
def digits_only(v):
return ''.join(c for c in v if c in digits)
def parsefunc(it):
default = None
for x in it:
try:
rec = float(x)
except ValueError as e:
callback = yield e
yield float(callback(x))
else:
yield rec
parsed_values = parsefunc(['4', '6', '5', '5h', '22', '7'])
for x in parsed_values:
if isinstance(x, ValueError):
x = parsed_values.send(digits_only)
print x

An example of a possible design:
from StringIO import StringIO
import csv
blah = StringIO('this,is,1\nthis,is\n')
def parse_csv(stream):
for row in csv.reader(stream):
try:
yield int(row[2])
except (IndexError, ValueError) as e:
pass # don't yield but might need something
# All others have to go up a level - so it wasn't parsable
# So if it's an IOError you know why, but this needs to catch
# exceptions potentially, just let the major ones propogate
for record in parse_csv(blah):
print record

I like the given answer with the Frozen stuff. Based on that idea I came up with this, solving two aspects I did not yet like. The first was the patterns needed to write it down. The second was the loss of the stack trace when yielding an exception. I tried my best to solve the first by using decorators as good as possible. I tried keeping the stack trace by using sys.exc_info() instead of the exception alone.
My generator normally (i.e. without my stuff applied) would look like this:
def generator():
def f(i):
return float(i) / (3 - i)
for i in range(5):
yield f(i)
If I can transform it into using an inner function to determine the value to yield, I can apply my method:
def generator():
def f(i):
return float(i) / (3 - i)
for i in range(5):
def generate():
return f(i)
yield generate()
This doesn't yet change anything and calling it like this would raise an error with a proper stack trace:
for e in generator():
print e
Now, applying my decorators, the code would look like this:
#excepterGenerator
def generator():
def f(i):
return float(i) / (3 - i)
for i in range(5):
#excepterBlock
def generate():
return f(i)
yield generate()
Not much change optically. And you still can use it the way you used the version before:
for e in generator():
print e
And you still get a proper stack trace when calling. (Just one more frame is in there now.)
But now you also can use it like this:
it = generator()
while it:
try:
for e in it:
print e
except Exception as problem:
print 'exc', problem
This way you can handle in the consumer any exception raised in the generator without too much syntactic hassle and without losing stack traces.
The decorators are spelled out like this:
import sys
def excepterBlock(code):
def wrapper(*args, **kwargs):
try:
return (code(*args, **kwargs), None)
except Exception:
return (None, sys.exc_info())
return wrapper
class Excepter(object):
def __init__(self, generator):
self.generator = generator
self.running = True
def next(self):
try:
v, e = self.generator.next()
except StopIteration:
self.running = False
raise
if e:
raise e[0], e[1], e[2]
else:
return v
def __iter__(self):
return self
def __nonzero__(self):
return self.running
def excepterGenerator(generator):
return lambda *args, **kwargs: Excepter(generator(*args, **kwargs))

(I answered the other question linked in the OP but my answer applies to this situation as well)
I have needed to solve this problem a couple of times and came upon this question after a search for what other people have done.
One option- which will probably require refactoring things a little bit- would be to simply create an error handling generator, and throw the exception in the generator (to another error handling generator) rather than raise it.
Here is what the error handling generator function might look like:
def err_handler():
# a generator for processing errors
while True:
try:
# errors are thrown to this point in function
yield
except Exception1:
handle_exc1()
except Exception2:
handle_exc2()
except Exception3:
handle_exc3()
except Exception:
raise
An additional handler argument is provided to the parsefunc function so it has a place to put the errors:
def parsefunc(stream, handler):
# the handler argument fixes errors/problems separately
while not eof(stream):
try:
rec = read_record(stream)
do some stuff
yield rec
except Exception as e:
handler.throw(e)
handler.close()
Now just use almost the original read function, but now with an error handler:
def read(stream, parsefunc):
handler = err_handler()
for record in parsefunc(stream, handler):
do_stuff(record)
This isn't always going to be the best solution, but it's certainly an option, and relatively easy to understand.

About your point of propagating exception from generator to consuming function,
you can try to use an error code (set of error codes) to indicate the error.
Though not elegant that is one approach you can think of.
For example in the below code yielding a value like -1 where you were expecting
a set of positive integers would signal to the calling function that there was
an error.
In [1]: def f():
...: yield 1
...: try:
...: 2/0
...: except ZeroDivisionError,e:
...: yield -1
...: yield 3
...:
In [2]: g = f()
In [3]: next(g)
Out[3]: 1
In [4]: next(g)
Out[4]: -1
In [5]: next(g)
Out[5]: 3

Actually, generators are quite limited in several aspects. You found one: the raising of exceptions is not part of their API.
You could have a look at the Stackless Python stuff like greenlets or coroutines which offer a lot more flexibility; but diving into that is a bit out of scope here.

Related

How to differentiate between cases of ValueError

Since too many python operations return ValueError, how can we differentiate between them?
Example: I expect an iterable to have a single element, and I want to get it
a, = [1, 2]: ValueError: too many values to unpack
a, = []: ValueError: too few values to unpack
How can I differentiate between those two cases?? eg
try:
a, = lst
except ValueError as e:
if e.too_many_values:
do_this()
else:
do_that()
I realise that in this particular case I could find a work-around using length/indexing, but the point is similar cases come up often, and I want to know if there's a general approach. I also realise I could check the error message for if 'too few' in message but it seems a bit crude.
try:
raise ValueError('my error')
except ValueError as e:
# use str(), not repr(), see
# https://stackoverflow.com/a/45532289/7919597
x = getattr(e, 'message', str(e))
if 'my error' in x:
print('got my error')
(see also How to get exception message in Python properly)
But this might not be a clean solution after all.
The best thing would be to narrow the scope of your try block so that only one was possible. Or don't depend on exceptions to detect those error cases.
This isn't really an answer, because it only applies if you have some control over how the exceptions are raised. Since exceptions are just objects, you can just tack on other objects / flags to them. Not saying that this is a great thing to do or a great way of doing it:
from enum import Enum
class ValueErrorType(Enum):
HelloType = 0,
FooType = 1
def some_func(string):
if "Hello" in string:
error = ValueError("\"Hello\" is not allowed in my strings!!!!")
error.error_type = ValueErrorType.HelloType
raise error
elif "Foo" in string:
error = ValueError("\"Foo\" is also not allowed!!!!!!")
error.error_type = ValueErrorType.FooType
raise error
try:
some_func("Hello World!")
except ValueError as error:
error_type_map = {
ValueErrorType.HelloType: lambda: print("It was a HelloType"),
ValueErrorType.FooType: lambda: print("It was a FooType")
}
error_type_map[error.error_type]()
I'd be curious to know if there is some way you can achieve this with exceptions where you have no control over how they're raised.

python try except as a function to evaluate expressions

I have tried creating a function that tries an expression and returns zero if errors are risen.
def try_or_zero(exp):
try:
exp
return exp
except:
return 0
Which obviously doesn't work. It seems the problem is that python doesn't have any form of lazy evaluation, so the expression is evaluated before it's passed to the function and so it rises the error before it gets into the function and therefor it never passes through the try logic.
Does anyone know if this can be done in Python?
Cheers
It seems the problem is that python doesn't have any form of lazy evaluation
Err... yes it does, but possibly not in the form you expect. Function arguments ARE indeed eval'd before being passed to the function, so
try_or_zero(foo.bar())
will indeed be executed as:
param = foo.bar()
try_or_zero(param)
Now python functions are plain objects (they can be used as variables, passed around as arguments to functions etc), and they are only invoked when applying the call operator (the parens, with or without arguments) so you can pass a function to try_or_zero and let try_or_zero call the function:
def try_or_zero(func):
try:
return func()
except Exception as e:
return 0
Now you're going to object that 1/ this will not work if the function expects arguments and 2/ having to write a function just for this is a PITA - and both objections are valid. Hopefully, Python also has a shortcut to create simple anonymous functions consisting of a single (even if arbitrarily complex) expression: lambda. Also, python functions (including "lambda functions" - which are, technically, plain functions) are closure - they capture the context in which they're defined - so it's quite easy to wrap all this together:
a = 42
b = "c"
def add(x, y):
return x + y
result = try_or_zero(lambda: add(a, b))
A side note about exception handling:
First don't use a bare except, at least catch Exception (else you might prevent some exception - like SysExit- to work as expected).
Also preferably only catch the exact exceptions you expect at a given point. In your case, you may want to pass a tuple of exceptions that you want to ignore, ie:
def try_or_zero(func, *exceptions):
if not exceptions:
exceptions = (Exception,)
try:
return func()
except exceptions as e:
return 0
a = 42
b = "c"
def add(x, y):
return x + y
result = try_or_zero(lambda: add(a, b), TypeError))
which will prevent your code from masking unexpected errors.
And finally: you may also want to add support for a return value other than zero in the case of an exception (not all expressions are supposed to return an int ):
# XXX : python3 only, python2 doesn't accept
# keyword args after *args
def try_or(func, *exceptions, default=0):
if not exceptions:
exceptions = (Exception,)
try:
return func()
except exceptions as e:
return default
# adding lists is legit too,
# so here you may want an empty list as the return value
# instead
a = [1, 2, 3]
# but only to lists
b = ""
result = try_or(lambda: a + b, TypeError, default=[]))
No need to bother with exec and stuff, use the fact that python functions are objects and thus can be passed as arguments
def try_or_zero(exp):
try:
return exp()
except:
return 0
And just call try_or_zero(my_awesome_func) (without the () for your method)
Pass the argument to function in str and do the exec inside function
def try_or_zero(exp):
try:
exec(exp)
return exp
except:
return 0
so your call to function will be like below
try_or_zero('1==2')
You can achieve this by enveloping your expressions in a function.
For example:
def ErrorTest():
# the expression you want to try
raise Exception
Also your try function should look like this:
def try_catch(exp):
try :
exp() # note the paranthesis
except:
return 0
And put it inside the function
try_or_zero(ErrorTest)
OutPut: 0
You Can also do it by using the eval() function, but you will have to put your code in String.
try_or_zero(exp):
try:
eval(exp) # exp must be a string, for example 'raise ValueError'
except:
return 0

Try-clause containing multiple statements

Let's say I have the following function/method, which calculates a bunch of stuff and then sets a lot a variables/attributes: calc_and_set(obj).
Now what I would like to do is to call the function several times with different objects, and if one or more fails then nothing should be set at all.
I thought I could do it like this:
try:
calc_and_set(obj1)
calc_and_set(obj2)
calc_and_set(obj3)
except:
pass
But this obviously doesn't work. If for instance the error happens in the third call to the function, then the first and second call will already have set the variables.
Can anyone think of a "clean" way of doing what I want? The only solutions I can think of are rather ugly workarounds.
I see a few options here.
A. Have a "reverse function", which is robust. So if
def calc_and_set(obj):
obj.A = 'a'
def unset(obj):
if hasattr(obj, 'A'):
del obj.A
and
try:
calc_and_set(obj1)
calc_and_set(obj2)
except:
unset(obj1)
unset(obj2)
Notice, that in this case, unset doesn't care if calc_and_set completed successfully or not.
B. Separate calc_and_set to try_calc_and_set, testing if it works, and set, which won't throw errors, and would be called only if all try_calc_and_set didn't fail.
try:
try_calc_and_set(obj1)
try_calc_and_set(obj2)
calc_and_set(obj1)
calc_and_set(obj2)
except:
pass
C. (my favorite) - have calc_and_set return a new variable, and not operate in place. If successful, replace the original reference with the new one. This could easily be done by adding copy as the first statement in calc_and_set, and then returning the variable.
try:
obj1_t = calc_and_set(obj1)
obj2_t = calc_and_set(obj2)
obj1 = obj1_t
obj2 = obj2_t
except:
pass
The mirror of that one is of course to save your objects before:
obj1_c = deepcopy(obj1)
obj2_c = deepcopy(obj2)
try:
calc_and_set(obj1)
calc_and_set(obj2)
except:
obj1 = obj1_c
obj2 = obj2_c
And as a general comment (if this is just a sample code, forgive me) - don't have excepts without specifying exception type.
You can also try cache the actions you want to take and then do them all in one go if everybody passes:
from functools import partial
def do_something (obj, val):
# magic here
def validate (obj):
if obj.is_what_you_want():
return partial(do_something, obj, val)
else:
raise ValueError ("unable to process %s" % obj)
instructions = [validate(item) for item in your_list_of_objects]
for each_partial in instructions:
each_partial()
The operations will only get fired if the list compehension collects without any exceptions. You could wrap that for exception safety:
try:
instructions = [validate(item) for item in your_list_of_objects]
for each_partial in instructions:
each_partial()
print "succeeded"
except ValueError:
print "failed"
If there is no "built-in" way of doing this, I think after all the "cleanest" solution is to divide the function in two parts. Something Like this:
try:
res1 = calc(obj1)
res2 = calc(obj2)
res3 = calc(obj3)
except:
pass
else:
set(obj1, res1)
set(obj2, res2)
set(obj3, res3)

Python's way to store a default value if expression failed

What is the python analog of perl's // operator?
In perl, one can do something like :
$pos = $some_list[0] // 1
How do you accomplish the same in python?
In Python there is no undefined; instead, you'd get an exception if you tried to access an non-existent index in a list. As such, you can use exception handling instead:
try:
pos = some_list[0]
except IndexError:
pos = 1
For the first element of a sequence, you could explicitly test the sequence as a boolean (a python container is 'falsey' when empty):
post = some_list[0] if some_list else 1
How about using exceptions?
try:
pos = some_list[0]
except (NameError, IndexError):
pos = 1
An alternative to try/catch answers above for dictionaries is the default argument on .get():
param_value = my_dictionary.get(param_key, default_value)
The best practice for this in python is to handle exceptions explicitly with a try, except clause. One example presented here to help you visuallize
my_list = []
try:
item = my_list[1]
except IndexError:
item = 1
Here the code executes and an exception is raised because the index "1" is out of bounds. We then go on to handle that exception and set item=1 allowing the program to continue running. The reason for this explicit handling of exceptions is so we as programmers see exactly what is causing our problems. Take this for example:
my_list = [0]
try:
item = 1/my_list[0]
except IndexError:
item = 1
This will raise a zero division error (halting execution) and let us know that we need to handle some other exception explicitly beyond the original exception we expected, the IndexError. We might then do something like this to deal with that situation:
my_list = [0]
try:
item = 1/my_list[0]
except IndexError:
item = 1
except ZeroDivisionError:
item = 99999
try-except blocks also have a few other notable features we can exploit:
try:
# code which might raise error
pass
except IndexError as err:
# handling an index error and storing the traceback in err
pass
except ZeroDivisionError:
#handling some other error:
pass
else:
# code we would like to execute if the try block succeeds without any errors
pass
finally:
# code we will execute regardless of what occurs in the entire
# try,except,else block listed above (i.e we can ensure a file is closed)
pass

How to try several methods with flat style?

If I want to try many way to avoid some error, I may write:
try:
try:
trial_1()
except some_error:
try:
trial_2()
except some_error:
try:
trial_3()
...
print "finally pass"
except some_error:
print "still fail"
But there are too many trials so too many nest, how to write it in a flat style?
If it's the same exception each time, you could do
for task in (trial_1, trial_2, trial_3, ...):
try:
task()
break
except some_error:
continue
If knowing whether it succeeded is important, the clearest way to add that is probably
successful = False
for task in (trial_1, trial_2, trial_3, ...):
try:
task()
successful = True
break
except some_error:
continue
if successful:
...
else:
...
You could do this:
def trial1 (): 42 / 0
def trial2 (): [] [42]
def trial3 (): 'yoohoo!'
def trial4 (): 'here be dragons'
for t in [trial1, trial2, trial3, trial4]:
print ('Trying {}.'.format (t.__name__) )
try:
t ()
print ('Success')
break
except Exception as ex:
print ('Failed due to {}'.format (ex) )
else:
print ('Epic fail!')
Output is:
Trying trial1.
Failed due to division by zero
Trying trial2.
Failed due to list index out of range
Trying trial3.
Success
Assuming that a) each trial is different, and b) they all throw the same error (since that's what your code illustrates), and c) you know the names of all the trial functions:
for t in (trial_1, trial_2, trial_3, trial_4):
try:
t()
# if we succeed, the trial is over
break
except some_error:
continue
This will loop over every trial, continuing in the case of the expected error, stopping if the trial succeeds, and throwing any other exceptions. I think that's the same behavior of your example code.
If you need to do this more than once, you can wrap up the answers Hyperboreus and others gave as a function:
def first_success(*callables):
for f in callables:
try:
return f()
except Exception as x:
print('{} failed due to {}'.format(f.__name__, x))
raise RuntimeError("still fail")
Then, all you need is:
first_success(trial_1, trial_2, trial_3, trial_4)
If you want to logging.info the exceptions instead of print them, or ignore them entirely, or keep track of them and attach the list of exceptions to the return value and/or exception as an attribute, etc., it should be pretty obvious how to modify this.
If you want to pass arguments to the functions, that's not quite as obvious, but still pretty easy. You just need to decide what the interface should be. Maybe take a sequence of callables as the first argument, and then all the callables' arguments after that:
first_success((trial_1, trial_2, trial_3, trial_4), 42, spam='spam')
That's easy:
def first_success(callables, *args, **kwargs):
for f in callables:
try:
return f(*args, **kwargs)
except Exception as x:
print('{} failed due to {}'.format(f.__name__, x))
else:
raise RuntimeError("still fail")
If you don't need exactly this pattern all the time, but you need a ton of not-quite-the-same things, you may want to instead write a function that just wraps any function in a try. I've actually built this half a dozen times, and then realized there was a more pythonic way to write my code that made this function unnecessary, so the only use I've ever gotten out of it was in arguments with Haskell snobs, but you may find a better use for it:
def tried(callable, *args, **kwargs):
try:
return (callable(*args, **kwargs), None)
except Exception as x:
return (None, x)
Now you can use higher-order functions like map, any, etc. For example, map(tried, (trial_1, trial_2, trial_3, trial_4)) gives you a sequence of four non-throwing functions, and you can (f(x[0]) if x[1] is None else x for x in tried_sequence) to work through a Haskell monad tutorial in Python, which is a good way to make both Python programmers and Haskell programmers hate you.

Categories

Resources