How do I check if my loop never ran at all?
This somehow looks too complicated to me:
x = _empty = object()
for x in data:
... # process x
if x is _empty:
raise ValueError("Empty data iterable: {!r:100}".format(data))
Ain't there a easier solution?
The above solution is from curiousefficiency.org
Update
data can contain None items.
data is an iterator, and I don't want to use it twice.
By "never ran", do you mean that data had no elements?
If so, the simplest solution is to check it before running the loop:
if not data:
raise Exception('Empty iterable')
for x in data:
...
However, as mentioned in the comments below, it will not work with some iterables, like files, generators, etc., so should be applied carefully.
The original code is best.
x = _empty = object()
_empty is called a sentinel value. In Python it's common to create a sentinel with object(), since it makes it obvious that the only purpose of _empty is to be a dummy value. But you could have used any mutable, for instance an empty list [].
Mutable objects are always guaranteed to be unique when you compare them with is, so you can safely use them as sentinel values, unlike immutables such as None or 0.
>>> None is None
True
>>> object() is object()
False
>>> [] is []
False
I propose the following:
loop_has_run = False
for x in data:
loop_has_run = True
... # process x
if not loop_has_run:
raise ValueError("Empty data iterable: {!r:100}".format(data))
I contend that this is better than the example in the question, because:
The intent is clearer (since the variable name specifies its meaning directly).
No objects are created or destroyed (which can have a negative performance impact).
It doesn't require paying attention to the subtle point that object() always returns a unique value.
Note that the loop_has_run = True assignment should be put at the start of the loop, in case (for example) the loop body contains break.
The following simple solution works with any iterable. It is based on the idea that we can check if there is a (first) element, and then keep iterating if there was one. The result is much clearer:
import itertools
try:
first_elmt = next(data)
except StopIteration:
raise ValueError("Empty data iterator: {!r:100}".format(data))
for x in itertools.chain([first_elmt], data):
…
PS: Note that it assumes that data is an iterator (as in the question). If it is merely an iterable, the code should be run on data_iter = iter(data) instead of on data (otherwise, say if data is a list, the loop would duplicate the first element).
The intent of that code isn't immediately obvious. Sure people would understand it after a while, but the code could be made clearer.
The solution I offer requires more lines of code, but that code is in a class that can be stored elsewhere. In addition this solution will work for iterables and iterators as well as sized containers.
Your code would be changed to:
it = HadItemsIterable(data)
for x in it:
...
if it.had_items:
...
The code for the class is as follows:
from collections.abc import Iterable
class HadItemsIterable(Iterable):
def __init__(self, iterable):
self._iterator = iter(iterable)
#property
def had_items(self):
try:
return self._had_items
except AttributeError as e:
raise ValueError("Not iterated over items yet")
def __iter__(self):
try:
first = next(self._iterator)
except StopIteration:
if hasattr(self, "_had_items"):
raise
self._had_items = False
raise
self._had_items = True
yield first
yield from self._iterator
You can add a loop_flag default as False, when loop executed, change it into True:
loop_flag = False
x = _empty = object()
for x in data:
loop_flag = True
... # process x
if loop_flag:
print "loop executed..."
What about this solution?
data=[]
count=None
for count, item in enumerate(data):
print (item)
if count is None:
raise ValueError('data is empty')
Related
This question already has answers here:
Return in generator together with yield
(2 answers)
Closed last year.
Why does
yield [cand]
return
lead to different output/behavior than
return [[cand]]
Minimal viable example
uses recursion
the output of the version using yield [1]; return is different than the output of the version using return [[1]]
def foo(i):
if i != 1:
yield [1]
return
yield from foo(i-1)
def bar(i):
if i != 1:
return [[1]]
yield from bar(i-1)
print(list(foo(1))) # [[1]]
print(list(bar(1))) # []
Min viable counter example
does not use recurion
the output of the version using yield [1]; return is the same as the output of the version using return [[1]]
def foo():
yield [1]
return
def foofoo():
yield from foo()
def bar():
return [[1]]
def barbar():
yield from bar()
print(list(foofoo())) # [[1]]
print(list(barbar())) # [[1]]
Full context
I'm solving Leetcode #39: Combination Sum and was wondering why one solution works, but not the other:
Working solution
from functools import cache # requires Python 3.9+
class Solution:
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
#cache
def helper(targ, i=0):
if i == N or targ < (cand := candidates[i]):
return
if targ == cand:
yield [cand]
return
for comb in helper(targ - cand, i):
yield comb + [cand]
yield from helper(targ, i+1)
N = len(candidates)
candidates.sort()
yield from helper(target)
Non-working solution
from functools import cache # requires Python 3.9+
class Solution:
def combinationSum(self, candidates: List[int], target: int) -> List[List[int]]:
#cache
def helper(targ, i=0):
if i == N or targ < (cand := candidates[i]):
return
if targ == cand:
return [[cand]]
for comb in helper(targ - cand, i):
yield comb + [cand]
yield from helper(targ, i+1)
N = len(candidates)
candidates.sort()
yield from helper(target)
Output
On the following input
candidates = [2,3,6,7]
target = 7
print(Solution().combinationSum(candidates, target))
the working solution correctly prints
[[3,2,2],[7]]
while the non-working solution prints
[]
I'm wondering why yield [cand]; return works, but return [[cand]] doesn't.
In a generator function, return just defines the value associated with the StopIteration exception implicitly raised to indicate an iterator is exhausted. It's not produced during iteration, and most iterating constructs (e.g. for loops) intentionally ignore the StopIteration exception (it means the loop is over, you don't care if someone attached random garbage to a message that just means "we're done").
For example, try:
>>> def foo():
... yield 'onlyvalue' # Existence of yield keyword makes this a generator
... return 'returnvalue'
...
>>> f = foo() # Makes a generator object, stores it in f
>>> next(f) # Pull one value from generator
'onlyvalue'
>>> next(f) # There is no other yielded value, so this hits the return; iteration over
--------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
...
StopIteration: 'returnvalue'
As you can see, your return value does get "returned" in a sense (it's not completely discarded), but it's never seen by anything iterating normally, so it's largely useless. Outside of rare cases involving using generators as coroutines (where you're using .send() and .throw() on instances of the generator and manually advancing it with next(genobj)), the return value of a generator won't be seen.
In short, you have to pick one:
Use yield anywhere in a function, and it's a generator (whether or not the code path of a particular call ever reaches a yield) and return just ends generation (while maybe hiding some data in the StopIteration exception). No matter what you do, calling the generator function "returns" a new generator object (which you can loop over until exhausted), it can never return a raw value computed inside the generator function (which doesn't even begin running until you loop over it at least once).
Don't use yield, and return works as expected (because it's not a generator function).
As an example to explain what happens to the return value in normal looping constructs, this is what for x in gen(): effectively expands to a C optimized version of:
__unnamed_iterator = iter(gen())
while True:
try:
x = next(__unnamed_iterator)
except StopIteration: # StopIteration caught here without inspecting it
break # Loop ends, StopIteration exception cleaned even from sys.exc_info() to avoid possible reference cycles
# body of loop goes here
# Outside of loop, there is no StopIteration object left
As you can see, the expanded form of the for loop has to look for a StopIteration to indicate the loop is over, but it doesn't use it. And for anything that's not a generator, the StopIteration never has any associated values; the for loop has no way to report them even if it did (it has to end the loop when it's told iteration is over, and the arguments to StopIteration are explicitly not part of the values iterated anyway). Anything else that consumes the generator (e.g. calling list on it) is doing roughly the same thing as the for loop, ignoring the StopIteration in the same way; nothing except code that specifically expects generators (as opposed to more generalized iterables and iterators) will ever bother to inspect the StopIteration object (at the C layer, there are optimizations that StopIteration objects aren't even produced by most iterators; they return NULL and leave the set exception empty, which all iterator protocol using things know is equivalent to returning NULL and setting a StopIteration object, so for anything but a generator, there isn't even an exception to inspect much of the time).
For now I have something in my code that looks like this:
def f(x):
if x == 5:
raise ValueError
else:
return 2 * x
interesting_values = range(10)
result = []
for i in interesting_values:
try:
result.append(f(i))
except ValueError:
pass
f is actually a more complex function and it fails for specific values in an unpredictible manner (I can't know if f(x) will fail or not before trying it).
What I am interested in is to have this result: the list of all the valid results of f.
I was wondering if there is a way to make the second part like a list comprehension. Of course I can't simply do this:
def f(x):
if x == 5:
raise ValueError
else:
return 2 * x
interesting_values = range(10)
result = [f(i) for i in interesting_values]
because the call for f(5) will make everything fail, but maybe there is a way to integrate the try-except structure in a list comprehension. Is it the case?
EDIT: I have control over f.
It seems like you have control of f and can modify how it handles errors.
If that's the case, and None isn't a valid output for the function, I would have it return None on an error instead of throwing:
def f(x):
if x == 5: return None
else: return 2*x
Then filter it:
results = (f(x) for x in interesting_values) # A generator expression; almost a list comptehension
valid_results = filter(lambda x: x is not None, results)
This is a stripped down version of what's often referred to as the "Optional Pattern". Return a special sentinal value on error (None in this case), else, return a valid value. Normally the Optional type is a special type and the sentinal value is a subclass of that type (or something similar), but that's not necessary here.
I'm going to assume here that you have no control over the source of f. If you do, the first suggestion is to simply rewrite f not to throw exceptions, as it's clear that you are expecting that execution path to occur, which by definition makes it not exceptional. However, if you don't have control over it, read on.
If you have a function that might fail and want its "failure" to be ignored, you can always just wrap the function
def safe_f(x):
try:
return f(x)
except ValueError:
return None
result = filter(lambda x: x is not None, map(safe_f, values))
Of course, if f could return None in some situation, you'll have to use a different sentinel value. If all else fails, you could always go the route of defining your own _sentinel = object() and comparing against it.
You could add another layer on top of your function. A decorator if you will, to transform the exception into something more usable. Actually this is a function that returns a decorator, so two additional layers:
from functools import wraps
def transform(sentinel=None, err_type=ValueError):
def decorator(f):
#wraps(f)
def func(*args, **kwargs):
try:
return f(*args, **kwargs)
except err_type:
return sentinel
return func
return decorator
#transform()
def f(...): ...
interesting = range(10)
result = [y for y in (f(x) for x in interesting) if y is not None]
This solution is tailored for the case where you get f from somewhere else. You can adjust transform to return a decorator for a given set of exceptions, and a sentinel value other than None, in case that's a valid return value. For example, if you import f, and it can raise TypeError in addition to ValueError, it would look like this:
from mystuff import f, interesting
sentinel = object()
f = transform(sentinel, (ValueError, TypeError))(f)
result = [y for y in (f(x) for x in interesting) if y is not sentinel]
You could also use the functional version of the comprehension elements:
result = list(filter(sentinel.__ne__, map(f, interesting)))
I implemented graph traversal as a generator function which yields the node being visited.
Sometimes the user needs to tell the traversal function that the edges outgoing from a particular node shouldn't be followed; in order to support that, the traversal checks the value sent back to it (using generator send() method), and if it's True, regards the node as a leaf for traversal purposes.
The problem is that the simplest user loop is kinda long:
# simplified thanks to #tobias_k
# bfs is the traversal generator function
traversal = bfs(g, start_node)
try:
n = next(traversal)
while True:
# process(n) returns True if don't want to follow edges out of n
n = traversal.send(process(n))
except StopIteration:
pass
Is there any way to improve this?
I thought something like this should work:
for n in bfs(g, start_node):
???.send(process(n))
but I feel I'm missing the knowledge of some python syntax.
I don't see a way to do this in a regular for loop. However, you could create another generator, that iterates another generator, using some "follow-function" to determine whether to follow the current element, thus encapsulating the tricky parts of your code into a separate function.
def checking_generator(generator, follow_function):
try:
x = next(generator)
while True:
yield x
x = generator.send(follow_function(x))
except StopIteration:
pass
for n in checking_generator(bfs(g, start_node), process):
print(n)
I discovered that my question would have had a one-line answer, using the extended "continue" statement proposed in the earlier version of PEP 342:
for n in bfs(g, start_node):
continue process(n)
However, while PEP 342 was accepted, that particular feature was withdrawn after this June 2005 discussion between Raymond and Guido:
Raymond Hettinger said:
Let me go on record as a strong -1 for "continue EXPR". The
for-loop is our most basic construct and is easily understood in its
present form. The same can be said for "continue" and "break" which
have the added advantage of a near zero learning curve for people
migrating from other languages.
Any urge to complicate these basic statements should be seriously
scrutinized and held to high standards of clarity, explainability,
obviousness, usefulness, and necessity. IMO, it fails most of those
tests.
I would not look forward to explaining "continue EXPR" in the tutorial
and think it would stand out as an anti-feature.
[...] The correct argument against "continue EXPR" is that there
are no use cases yet; if there were a good use case, the explanation
would follow easily.
Guido
If python core developers have since changed their mind about the usefulness of extended "continue", perhaps this could be reintroduced into a future PEP. But, given a nearly identical use case as in this question was already discussed in the quoted thread, and wasn't found persuasive, it seems unlikely.
To simplify the client code, you could use an ordinary bsf() generator and check node.isleaf attribute in it:
for node in bfs(g, start_node):
node.isleaf = process(node) # don't follow if `process()` returns True
The disadvantage is that node is mutable. Or you have to pass a shared data structure that tracks leaf nodes: leaf[node] = process(node) where leaf dictionary is passed into bfs() earlier.
If you want to use .send() method explicitly; you have to handle StopIteration. See PEP 479 -- Change StopIteration handling inside generators. You could hide it in a helper function:
def traverse(tree_generator, visitor):
try:
node = next(tree_generator)
while True:
node = tree_generator.send(visitor(node))
except StopIteration:
pass
Example:
traverse(bfs(g, start_node), process)
I don't see this as a frequent use case, consider this as the original generator:
def original_gen():
for x in range(10):
should_break = yield x
if should_break:
break
If the value of should_break is always calculated based on some function call with x then why not just write the generator like this:
def processing_gen(check_f):
for x in range(10):
yield x
should_break = check_f(x)
if should_break:
break
However I usually think of the code that processes the generated values as being written inside the loop (otherwise what is the point of having a loop at all?)
What it really seems you want to do is create a generator where calling the __next__ method really implies send(process(LAST_VALUE)) which can be implemented with a class:
class Followup_generator(): #feel free to use a better name
def __init__(self,generator,following_function):
self.gen = generator
self.process_f = following_function
def __iter__(self):
return self
def __next__(self):
if hasattr(self,"last_value"):
return self.send(self.process_f(self.last_value))
else:
self.last_value = next(self.gen)
return self.last_value
def send(self,arg):
self.last_value = self.gen.send(arg)
return self.last_value
def __getattr__(self,attr):
"forward other lookups to the generator (.throw etc.)"
return getattr(self.gen, attr)
# call signature is the exact same as #tobias_k's checking_generator
traversal = Followup_generator(bfs(g, start_node), process)
for n in traversal:
print(n)
n = traversal.send(DATA) #you'd be able to send extra values to it
However this still doesn't see this as frequently used, I'd be perfectly fine with a while loop, although I'd put the .send call at the top:
traversal = bfs(g, start_node)
send_value = None
while True:
n = traversal.send(send_value)
#code for loop, ending in calculating the next send_value
send_value = process(n)
And you might wrap that in a try: ... except StopIteration:pass although I find that simply waiting for an error to raise is better expressed with a context manager:
class Catch:
def __init__(self,exc_type):
if issubclass(exc_type,BaseException):
self.catch_type = exc_type
else:
raise TypeError("can only catch Exceptions")
def __enter__(self):
return self
def __exit__(self,exc_type,err, tb):
if issubclass(exc_type, self.catch_type):
self.err = err
return True
with Catch(StopIteration):
traversal = bfs(g, start_node)
send_value = None
while True:
n = traversal.send(send_value)
#code for loop, ending in calculating the next send_value
send_value = process(n)
Probably this is the answer to the question from the thread's topic.
Take a look at the additional empty yields statements inside the traversal function and custom send function, that does the magical job.
# tested with Python 3.7
def traversal(n):
for i in range(n):
yield i, '%s[%s] %s' % (' ' * (4 - n), n, i)
stop = yield
if stop:
yield # here's the first part of the magic
else:
yield # the same as above
yield from traversal(int(n / 2))
def send(generator, value):
next(generator) # here's the second part of the magic
generator.send(value)
g = traversal(4)
for i, (num, msg) in enumerate(g):
print('>', i, msg)
stop = num % 2 == 0
send(g, stop)
I've written a small class SettableGenerator which uses a method to receive the value to be send and then forwards it to the actual generator when __next__ is invoked.
With this you can write:
gen = SettableGenerator(bfs(g, start_node))
for n in gen:
gen.set(process(n))
Let's consider the following generator. It generates numbers from 0 to 9. For every generated number, it gets an input and stores it into ret:
def count_to_nine():
# Output: numbers from 0 to 9
# Input: converted numbers
ret = []
for i in range(10):
# Yield a number, get something back
val = (yield i)
# Remember that "something"
ret.append(val)
return ret
You can, indeed, iterate it using next() + send(),
but the best way is to iterate using send() alone:
g = count_to_nine()
value = None # to make sure that the first send() gives a None
while True:
value = g.send(value) # send the previously generated value, get a new one
value = f'#{value}'
Here's the result:
StopIteration: ['#0', '#1', '#2', '#3', '#4', '#5', '#6', '#7', '#8', '#9']
If you want that output, catch the StopIteration and get the result from it.
Cheers!
It is quite common to fall into the issue where you have a N sized collection but want to work with a singular item (conceptually a 0 or 1 sized collection).
I could write the traditional if:
def singular_item(collection):
if collection:
return collection[0]
else:
return None
and simplify to:
def singular_item(collection):
return collection[0] if collection else None
But it would not work with iterables, only collections with a defined size. Passing a generator for instance would fail:
singular_item((_ for _ in range(10)))
=> TypeError: 'generator' object has no attribute '__getitem__'
So what I normally do is this:
def singular_item(collection):
return next((_ for _ in collection), None)
singular_item([1]) -> 1
singular_item([1,2,3]) -> 1
singular_item([]) -> None
This works well for any collection (or iterable), but it feels somewhat clumsy creating a generator for getting just one item. Also the readability is somewhat killed in it: the two other examples are much more explicit about what the code is trying to do.
So my questions are:
Is there a better way to do this, maybe by using a builtin function?
Do you waste resources when creating a generator for getting just one item?
Use the iter() function to create an iterable instead:
def singular_item(collection):
return next(iter(collection), None)
iter() calls collection.__iter__() to obtain an iterable object for next() to loop over, which could be the collection object itself.
Iterators are very efficient otherwise, this approach is just the right way to handle any iterable or sequence.
For the zero or one case, I'd go for (based on the (conceptually a 0 or 1 sized collection)):
def one(iterable, default=None):
i = iter(iterable)
fst = next(i, default)
try:
next(i)
raise ValueError('Must be only 0 or 1 values')
except StopIteration:
return fst
Is there an uniform way of knowing if an iterable object will be consumed by the iteration?
Suppose you have a certain function crunch which asks for an iterable object for parameter, and uses it many times. Something like:
def crunch (vals):
for v in vals:
chomp(v)
for v in vals:
yum(v)
(note: merging together the two for loops is not an option).
An issue arises if the function gets called with an iterable which is not a list. In the following call the yum function is never executed:
crunch(iter(range(4))
We could in principle fix this by redefining the crunch function as follows:
def crunch (vals):
vals = list(vals)
for v in vals:
chomp(v)
for v in vals:
yum(v)
But this would result in using twice the memory if the call to crunch is:
hugeList = list(longDataStream)
crunch(hugeList)
We could fix this by defining crunch like this:
def crunch (vals):
if type(vals) is not list:
vals = list(vals)
for v in vals:
chomp(v)
for v in vals:
yum(v)
But still there colud be the case in which the calling code stores data in something which
cannot be consumed
is not a list
For instance:
from collections import deque
hugeDeque = deque(longDataStream)
crunch(hugeDeque)
It would be nice to have a isconsumable predicate, so that we can define crunch like this:
def crunch (vals):
if isconsumable(vals):
vals = list(vals)
for v in vals:
chomp(v)
for v in vals:
yum(v)
Is there a solution for this problem?
One possibility is to test whether the item is a Sequence, using isinstance(val, collections.Sequence). Non-consumability still isn't totally guaranteed but I think it's about the best you can get. A Python sequence has to have a length, which means that at least it can't be an open-ended iterator, and in general implies that the elements have to be known ahead of time, which in turn implies that they can be iterated over without consuming them. It's still possible to write pathological classes that fit the sequence protocol but aren't re-iterable, but you'll never be able to handle those.
Note that neither Iterable nor Iterator is the appropriate choice, because these types don't guarantee a length, and hence can't guarantee that the iteration will even be finite, let alone repeatable. You could, however, check for both Sized and Iterable.
The important thing is to document that your function will iterate over its argument twice, thus warning users that they must pass in an object that supports this.
Another, additional option could be to query if the iterable is its own iterator:
if iter(vals) is vals:
vals = list(vals)
because in this case, it is just an iterator.
This works with generators, iterators, files and many other objects which are designed for "one run", in other words, all iterables which are iterators by itself, because an iterator returns self from its __iter__().
But this might not be enough, because there are objects which empty themselves on iteration without being their own iterator.
Normally, a self-consuming object will be its own iterator, but there are cases where this might not be allowed.
Imagine a class which wraps a list and empties this list on iteration, such as
class ListPart(object):
"""Liste stückweise zerlegen."""
def __init__(self, data=None):
if data is None: data = []
self.data = data
def next(self):
try:
return self.data.pop(0)
except IndexError:
raise StopIteration
def __iter__(self):
return self
def __len__(self): # doesn't work with __getattr__...
return len(self.data)
which you call like
l = [1, 2, 3, 4]
lp = ListPart(l)
for i in lp: process(i)
# now l is empty.
If I add now additional data to that list and iterate over the same object again, I'll get the new data which is a breach of the protocol:
The intention of the protocol is that once an iterator’s next() method raises StopIteration, it will continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken. (This constraint was added in Python 2.3; in Python 2.2, various iterators are broken according to this rule.)
So in this case, the object would have to return an iterator distinct from itself despite of being self-consuming. In this case, this could be done with
def __iter__(self):
while True:
try:
yield l.pop(0)
except IndexError: # pop from empty list
return
which returns a new generator on each iteration - something which would fall though the mash in the case we are discussing.
def crunch (vals):
vals1, vals2 = itertools.tee(vals, 2)
for v in vals1:
chomp(v)
for v in vals2:
yum(v)
In this case tee will end up storing the entirity of vals internally since one iterator is completed before the other one is started
Many answers come close to the point but miss it.
An Iterator is an object that is consumed by iterating over it. There is no way around it. Example of iterator objects are those returned by calls to iter(), or those returned by the functions in the itertools module.
The proper way to check whether an object is an iterator is to call isinstance(obj, Iterator). This basically checks whether the object implements the next() method (__next__() in Python 3) but you don't need to care about this.
So, remember, an iterator is always consumed. For example:
# suppose you have a list
my_list = [10, 20, 30]
# and build an iterator on the list
my_iterator = iter(my_list)
# iterate the first time over the object
for x in my_iterator:
print x
# then again
for x in my_iterator:
print x
This will print the content of the list just once.
Then there are Iterable objects. When you call iter() on an iterable it will return an iterator. Commenting in this page I made myself an error, so I will clarify here. Iterable objects are not required to return a new iterator on every call. Many iterators themselves are iterables (i.e. you can call iter() on them) and they will return the object itself.
A simple example for this are list iterators. iter(my_list) and iter(iter(my_list)) are the same object, and this is basically what #glglgl answer is checking for.
The iterator protocol requires iterator objects to return themselves as their own iterator (and thus be iterable). This is not required for the iteration mechanics to work, but you wouldn't be able to loop over the iterator object.
All of this said, what you should do is check whether you're given an Iterator, and if that's the case, make a copy of the result of the iteration (with list()). Your isconsumable(obj) is (as someone already said) isinstance(obj, Iterator).
Note that this also works for xrange(). xrange(10) returns an xrange object. Every time you iter over the xrange objects it returns a new iterator starting from the start, so you're fine and don't need to make a copy.
Here is a summary of definitions.
container
An object with a __contains__ method
generator
A function which returns an iterator.
iterable
A object with an __iter__() or __getitem__() method.
Examples of iterables include all sequence types (such as list,
str, and tuple) and some non-sequence types like dict and file.
When an iterable object is passed as an argument to the builtin
function iter(), it returns an iterator for the object. This
iterator is good for one pass over the set of values.
iterator
An iterable which has a next() method.
Iterators are required to have an
__iter__() method that returns the iterator object itself.
An iterator is
good for one pass over the set of values.
sequence
An iterable which supports efficient element access using integer
indices
via the __getitem__() special method and defines a len() method that returns
the length of the sequence.
Some built-in sequence types are list, str,
tuple, and unicode.
Note that dict also supports __getitem__() and
__len__(), but is considered a mapping rather than a sequence because the
lookups use arbitrary immutable keys rather than integers.
Now there is a multitude of ways of testing if an object is an iterable, or iterator, or sequence of some sort. Here is a summary of these ways, and how they classify various kinds of objects:
Iterable Iterator iter_is_self Sequence MutableSeq
object
[] True False False True True
() True False False True False
set([]) True False False False False
{} True False False False False
deque([]) True False False False False
<listiterator> True True True False False
<generator> True True True False False
string True False False True False
unicode True False False True False
<open> True True True False False
xrange(1) True False False True False
Foo.__iter__ True False False False False
Sized has_len has_iter has_contains
object
[] True True True True
() True True True True
set([]) True True True True
{} True True True True
deque([]) True True True False
<listiterator> False False True False
<generator> False False True False
string True True False True
unicode True True False True
<open> False False True False
xrange(1) True True True False
Foo.__iter__ False False True False
Each columns refers to a different way to classify iterables, each rows refers to a different kind of object.
import pandas as pd
import collections
import os
def col_iterable(obj):
return isinstance(obj, collections.Iterable)
def col_iterator(obj):
return isinstance(obj, collections.Iterator)
def col_sequence(obj):
return isinstance(obj, collections.Sequence)
def col_mutable_sequence(obj):
return isinstance(obj, collections.MutableSequence)
def col_sized(obj):
return isinstance(obj, collections.Sized)
def has_len(obj):
return hasattr(obj, '__len__')
def listtype(obj):
return isinstance(obj, types.ListType)
def tupletype(obj):
return isinstance(obj, types.TupleType)
def has_iter(obj):
"Could this be a way to distinguish basestrings from other iterables?"
return hasattr(obj, '__iter__')
def has_contains(obj):
return hasattr(obj, '__contains__')
def iter_is_self(obj):
"Seems identical to col_iterator"
return iter(obj) is obj
def gen():
yield
def short_str(obj):
text = str(obj)
if text.startswith('<'):
text = text.split()[0] + '>'
return text
def isiterable():
class Foo(object):
def __init__(self):
self.data = [1, 2, 3]
def __iter__(self):
while True:
try:
yield self.data.pop(0)
except IndexError: # pop from empty list
return
def __repr__(self):
return "Foo.__iter__"
filename = 'mytestfile'
f = open(filename, 'w')
objs = [list(), tuple(), set(), dict(),
collections.deque(), iter([]), gen(), 'string', u'unicode',
f, xrange(1), Foo()]
tests = [
(short_str, 'object'),
(col_iterable, 'Iterable'),
(col_iterator, 'Iterator'),
(iter_is_self, 'iter_is_self'),
(col_sequence, 'Sequence'),
(col_mutable_sequence, 'MutableSeq'),
(col_sized, 'Sized'),
(has_len, 'has_len'),
(has_iter, 'has_iter'),
(has_contains, 'has_contains'),
]
funcs, labels = zip(*tests)
data = [[test(obj) for test in funcs] for obj in objs]
f.close()
os.unlink(filename)
df = pd.DataFrame(data, columns=labels)
df = df.set_index('object')
print(df.ix[:, 'Iterable':'MutableSeq'])
print
print(df.ix[:, 'Sized':])
isiterable()