Suppose I'm writing a function taking in an iterable, and my function wants to be agnostic as to whether that iterable is actually an iterator yet or not.
(This is a common situation, right? I think basically all the itertools functions are written this way. Take in an iterable, return an iterator.)
If I call, for instance, itertools.tee(•, 2) on an object, and it happens to not be an iterator yet, that presumably means it would be cheaper just to call iter on it twice to get my two independent iterators. Are itertools functions smart enough to know this, and if not, what's the best way to avoid unnecessary costs in this way?
Observe:
>>> def foo(x):
... return x.__iter__() # or return iter(x)
...
>>> l = [0, 1]
>>> it = l.__iter__()
>>> it
<list_iterator object at 0x00000190F59C3640>
>>> print(foo(l), foo(it))
<list_iterator object at 0x00000190F5980AF0> <list_iterator object at 0x00000190F59C3640>
So you do not need to worry whether the argument to your function is an iterable or already an iterator. You can call method __iter__ on something that is already an iterator and it just returns self in that case. This is not an expensive call and would be cheaper than anything you could possibly do to test to see if it is an iterator, such as whether it has a __next__ method (and then having to call __iter__ on it anyway if it doesn't).
Update
We now see that there is a bit difference in passing to your function an iterable vs passing an iterator (depending on how the iterator is written, of course) since calling iter twice on the former will give you two distinct iterators while calling iter twice on the latter will not. itertools.tee, as an example, is expecting an iterable. If you pass it an iterator that implements __iter__ that returns 'selfit will clearly work sincetee` does not need two independent iterators for it to do its magic.
But if you are writing an iterator that is passed an iterable that is implemented by internally using two or more iterators on the passed iterator, what you really want to be testing for is whether what is being passed is something that support multiple, concurrent, independent iterations regardless of whether it is an iterator or just a plain iterator:
def my_iterator(iterable):
it1 = iter(iterable)
it2 = iter(iterable)
if it1 is it2:
raise ValueError('The passed iterable does not support multiple, concurrent, independent iterations.')
...
class Foo:
def __init__(self, lst):
self.lst = lst
def __iter__(self):
self.idx = 0
return self
def __next__(self):
if self.idx < len(self.lst):
value = self.lst[self.idx]
self.idx += 1
return value
raise StopIteration()
f = Foo("abcd")
for x in f:
print(x)
my_iterator(f)
Prints:
a
b
c
d
Traceback (most recent call last):
File "C:\Booboo\test\test.py", line 26, in <module>
my_iterator(f)
File "C:\Booboo\test\test.py", line 5, in my_iterator
raise ValueError('The passed iterable does not support multiple, concurrent, independent iterations.')
ValueError: The passed iterable does not support multiple, concurrent, independent iterations.
The writer of the original, passed iterator must write it in such a way that it supports multiple, concurrent, independent iterations.
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
iterator is a more general concept: any object whose class has a __next__ method (next in Python 2) and an __iter__ method that does return self.
Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator.
You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides __next__ (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it's simpler to code because state maintenance (within reasonable limits) is basically "done for you" by the frame getting suspended and resumed.
For example, a generator such as:
def squares(start, stop):
for i in range(start, stop):
yield i * i
generator = squares(a, b)
or the equivalent generator expression (genexp)
generator = (i*i for i in range(a, b))
would take more code to build as a custom iterator:
class Squares(object):
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self): return self
def __next__(self): # next in Python 2
if self.start >= self.stop:
raise StopIteration
current = self.start * self.start
self.start += 1
return current
iterator = Squares(a, b)
But, of course, with class Squares you could easily offer extra methods, i.e.
def current(self):
return self.start
if you have any actual need for such extra functionality in your application.
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
In summary: Iterators are objects that have an __iter__ and a __next__ (next in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.
A function with yield in it is still a function, that, when called, returns an instance of a generator object:
def a_function():
"when called, returns generator object"
yield
A generator expression also returns a generator:
a_generator = (i for i in range(0))
For a more in-depth exposition and examples, keep reading.
A Generator is an Iterator
Specifically, generator is a subtype of iterator.
>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True
We can create a generator several ways. A very common and simple way to do so is with a function.
Specifically, a function with yield in it is a function, that, when called, returns a generator:
>>> def a_function():
"just a function definition with yield in it"
yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function() # when called
>>> type(a_generator) # returns a generator
<class 'generator'>
And a generator, again, is an Iterator:
>>> isinstance(a_generator, collections.Iterator)
True
An Iterator is an Iterable
An Iterator is an Iterable,
>>> issubclass(collections.Iterator, collections.Iterable)
True
which requires an __iter__ method that returns an Iterator:
>>> collections.Iterable()
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:
>>> all(isinstance(element, collections.Iterable) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
Iterators require a next or __next__ method
In Python 2:
>>> collections.Iterator()
Traceback (most recent call last):
File "<pyshell#80>", line 1, in <module>
collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next
And in Python 3:
>>> collections.Iterator()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__
We can get the iterators from the built-in objects (or custom objects) with the iter function:
>>> all(isinstance(iter(element), collections.Iterator) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
The __iter__ method is called when you attempt to use an object with a for-loop. Then the __next__ method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration when you have exhausted it, and it cannot be reused at that point.
From the documentation
From the Generator Types section of the Iterator Types section of the Built-in Types documentation:
Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and next() [__next__() in Python 3] methods. More information about generators can be found in the documentation for the yield expression.
(Emphasis added.)
So from this we learn that Generators are a (convenient) type of Iterator.
Example Iterator Objects
You might create object that implements the Iterator protocol by creating or extending your own object.
class Yes(collections.Iterator):
def __init__(self, stop):
self.x = 0
self.stop = stop
def __iter__(self):
return self
def next(self):
if self.x < self.stop:
self.x += 1
return 'yes'
else:
# Iterators must raise when done, else considered broken
raise StopIteration
__next__ = next # Python 3 compatibility
But it's easier to simply use a Generator to do this:
def yes(stop):
for _ in range(stop):
yield 'yes'
Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):
yes_expr = ('yes' for _ in range(stop))
They can all be used in the same way:
>>> stop = 4
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop),
('yes' for _ in range(stop))):
... print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes
Conclusion
You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.
However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.
Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to "What does the “yield” keyword do?".
Iterators are objects which use the next() method to get the following values of a sequence.
Generators are functions that produce or yield a sequence of values using the yield keyword.
Every next() method call on a generator object(for ex: f below) returned by a generator function (for ex: foo() below), generates the next value in the sequence.
When a generator function is called, it returns an generator object without even beginning the execution of the function. When the next() method is called for the first time, the function starts executing until it reaches a yield statement which returns the yielded value. The yield keeps track of what has happened, i.e. it remembers the last execution. And secondly, the next() call continues from the previous value.
The following example demonstrates the interplay between yield and the call to the next method on a generator object.
>>> def foo():
... print("begin")
... for i in range(3):
... print("before yield", i)
... yield i
... print("after yield", i)
... print("end")
...
>>> f = foo()
>>> next(f)
begin
before yield 0 # Control is in for loop
0
>>> next(f)
after yield 0
before yield 1 # Continue for loop
1
>>> next(f)
after yield 1
before yield 2
2
>>> next(f)
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Adding an answer because none of the existing answers specifically address the confusion in the official literature.
Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator - it has a next() method. When you call next(), the next value yielded by the generator function is returned.
Either the function or the object may be called the "generator" depending on which Python source document you read. The Python glossary says generator functions, while the Python wiki implies generator objects. The Python tutorial remarkably manages to imply both usages in the space of three sentences:
Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed).
The first two sentences identify generators with generator functions, while the third sentence identifies them with generator objects.
Despite all this confusion, one can seek out the Python language reference for the clear and final word:
The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.
When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of a generator function.
So, in formal and precise usage, "generator" unqualified means generator object, not generator function.
The above references are for Python 2 but Python 3 language reference says the same thing. However, the Python 3 glossary states that
generator ... Usually refers to a generator function, but may refer to a generator iterator in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity.
Everybody has a really nice and verbose answer with examples and I really appreciate it. I just wanted to give a short few lines answer for people who are still not quite clear conceptually:
If you create your own iterator, it is a little bit involved - you have
to create a class and at least implement the iter and the next methods. But what if you don't want to go through this hassle and want to quickly create an iterator. Fortunately, Python provides a short-cut way to defining an iterator. All you need to do is define a function with at least 1 call to yield and now when you call that function it will return "something" which will act like an iterator (you can call next method and use it in a for loop). This something has a name in Python called Generator
Hope that clarifies a bit.
Examples from Ned Batchelder highly recommended for iterators and generators
A method without generators that do something to even numbers
def evens(stream):
them = []
for n in stream:
if n % 2 == 0:
them.append(n)
return them
while by using a generator
def evens(stream):
for n in stream:
if n % 2 == 0:
yield n
We don't need any list nor a return statement
Efficient for large/ infinite length stream ... it just walks and yield the value
Calling the evens method (generator) is as usual
num = [...]
for n in evens(num):
do_smth(n)
Generator also used to Break double loop
Iterator
A book full of pages is an iterable, A bookmark is an
iterator
and this bookmark has nothing to do except to move next
litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration (Exception) as we got end of the iterator
To use Generator ... we need a function
To use Iterator ... we need next and iter
As been said:
A Generator function returns an iterator object
The Whole benefit of Iterator:
Store one element a time in memory
No-code 4 line cheat sheet:
A generator function is a function with yield in it.
A generator expression is like a list comprehension. It uses "()" vs "[]"
A generator object (often called 'a generator') is returned by both above.
A generator is also a subtype of iterator.
Previous answers missed this addition: a generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator, which may be caught in a finally clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.
That said, my personal answer to the first question would be: iteratable has an __iter__ method only, typical iterators have a __next__ method only, generators has both an __iter__ and a __next__ and an additional close.
For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.
For control flow matters, generators are an as much important concept as promises: both are abstract and composable.
It's difficult to answer the question without 2 other concepts: iterable and iterator protocol.
What is difference between iterator and iterable?
Conceptually you iterate over iterable with the help of corresponding iterator. There are a few differences that can help to distinguish iterator and iterable in practice:
One difference is that iterator has __next__ method, iterable does not.
Another difference - both of them contain __iter__ method. In case of iterable it returns the corresponding iterator. In case of iterator it returns itself.
This can help to distinguish iterator and iterable in practice.
>>> x = [1, 2, 3]
>>> dir(x)
[... __iter__ ...]
>>> x_iter = iter(x)
>>> dir(x_iter)
[... __iter__ ... __next__ ...]
>>> type(x_iter)
list_iterator
What are iterables in python? list, string, range etc. What are iterators? enumerate, zip, reversed etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range and zip? One of the reasons to do this - range has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here).
How can we create an iterator ourselves? Theoretically we may implement Iterator Protocol (see here). We need to write __next__ and __iter__ methods and raise StopIteration exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators in python.
I can give you a few more interesting examples that show somewhat confusing usage of those concepts in practice:
in keras we have tf.keras.preprocessing.image.ImageDataGenerator; this class doesn't have __next__ and __iter__ methods; so it's not an iterator (or generator);
if you call its flow_from_dataframe() method you'll get DataFrameIterator that has those methods; but it doesn't implement StopIteration (which is not common in build-in iterators in python); in documentation we may read that "A DataFrameIterator yielding tuples of (x, y)" - again confusing usage of terminology;
we also have Sequence class in keras and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__ and __iter__, rather it's a wrapper around generators (it uses yield statement);
Generator Function, Generator Object, Generator:
A Generator function is just like a regular function in Python but it contains one or more yield statements. Generator functions is a great tool to create Iterator objects as easy as possible. The Iterator object returend by generator function is also called Generator object or Generator.
In this example I have created a Generator function which returns a Generator object <generator object fib at 0x01342480>. Just like other iterators, Generator objects can be used in a for loop or with the built-in function next() which returns the next value from generator.
def fib(max):
a, b = 0, 1
for i in range(max):
yield a
a, b = b, a + b
print(fib(10)) #<generator object fib at 0x01342480>
for i in fib(10):
print(i) # 0 1 1 2 3 5 8 13 21 34
print(next(myfib)) #0
print(next(myfib)) #1
print(next(myfib)) #1
print(next(myfib)) #2
So a generator function is the easiest way to create an Iterator object.
Iterator:
Every generator object is an iterator but not vice versa. A custom iterator object can be created if its class implements __iter__ and __next__ method (also called iterator protocol).
However, it is much easier to use generators function to create iterators because they simplify their creation, but a custom Iterator gives you more freedom and you can also implement other methods according to your requirements as shown in the below example.
class Fib:
def __init__(self,max):
self.current=0
self.next=1
self.max=max
self.count=0
def __iter__(self):
return self
def __next__(self):
if self.count>self.max:
raise StopIteration
else:
self.current,self.next=self.next,(self.current+self.next)
self.count+=1
return self.next-self.current
def __str__(self):
return "Generator object"
itobj=Fib(4)
print(itobj) #Generator object
for i in Fib(4):
print(i) #0 1 1 2
print(next(itobj)) #0
print(next(itobj)) #1
print(next(itobj)) #1
This thread covers in many details all the differences between the two, but wanted to add something on the conceptual difference between the two:
[...] an iterator as defined in the GoF book retrieves items from a collection, while a generator can produce items “out of thin air”. That’s why the Fibonacci sequence generator is a common example: an infinite series of numbers cannot be stored in a collection.
Ramalho, Luciano. Fluent Python (p. 415). O'Reilly Media. Kindle Edition.
Sure, it does not cover all the aspects but I think it gives a good notion when one can be useful.
You can compare both approaches for the same data:
def myGeneratorList(n):
for i in range(n):
yield i
def myIterableList(n):
ll = n*[None]
for i in range(n):
ll[i] = i
return ll
# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
print("{} {}".format(i1, i2))
# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
Besides, if you check the memory footprint, the generator takes much less memory as it doesn't need to store all the values in memory at the same time.
An iterable object is something which can be iterated (naturally). To do that, however, you will need something like an iterator object, and, yes, the terminology may be confusing. Iterable objects include a __iter__ method which will return the iterator object for the iterable object.
An iterator object is an object which implements the iterator protocol - a set of rules. In this case, it must have at least these two methods: __iter__ and __next__. The __next__ method is a function which supplies a new value. The __iter__ method returns the iterator object. In a more complex object, there may be a separate iterator, but in a simpler case, __iter__ returns the object itself (typically return self).
One iterable object is a list object. It’s not an iterator, but it has an __iter__ method which returns an iterator. You can call this method directly as things.__iter__(), or use iter(things).
If you want to iterate through any collection, you will need to use its iterator:
things_iterator = iter(things)
for i in things_iterator:
print(i)
However, Python will automatically use the iterator, which is why you never see the above example. Instead you write:
for i in things:
print(i)
Writing an iterator yourself can be tedious, so Python has a simpler alternative: the generator function. A generator function is not an ordinary function. Instead of running through the code and returning a final result, the code is deferred, and the function returns immediately with a generator object.
A generator object is like an iterator object in that it implements the iterator protocol. That’s good enough for most purposes. There are many examples of generators in the other answers.
In short, an iterator is an object which allows you to iterate through another object, whether it’s a collection or some other source of values. A generator is a simplified iterator which does more-or-less the same job, but is easier to implement.
Normally, you would go for a generator if that’s all you need. If, however, you’re building a more complex object which includes iteration among other features, you would use the iterator protocol instead.
I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.
Let’s start with the very basic:
Consider a list,
l = [1,2,3]
Let’s write an equivalent function:
def f():
return [1,2,3]
o/p of print(l): [1,2,3] &
o/p of print(f()) : [1,2,3]
Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.
Let’s apply iterator on list:
iter_l = iter(l) # iterator applied explicitly
Let’s make a function iterable, i.e. write an equivalent generator function.
In python as soon as you introduce the keyword yield; it becomes a generator function and iterator will be applied implicitly.
Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux
So the generator function will be:
def f():
yield 1
yield 2
yield 3
iter_f = f() # which is iter(f) as iterator is already applied implicitly
So if you have observed, as soon as you made function f a generator, it is already iter(f)
Now,
l is the list, after applying iterator method "iter" it becomes,
iter(l)
f is already iter(f), after applying iterator method "iter" it
becomes, iter(iter(f)), which is again iter(f)
It's kinda you are casting int to int(x) which is already int and it will remain int(x).
For example o/p of :
print(type(iter(iter(l))))
is
<class 'list_iterator'>
Never forget this is Python and not C or C++
Hence the conclusion from above explanation is:
list l ~= iter(l)
generator function f == iter(f)
All generators are iterators but not vice versa.
from typing import Iterator
from typing import Iterable
from typing import Generator
class IT:
def __init__(self):
self.n = 0
def __iter__(self):
return self
def __next__(self):
if self.n == 4:
raise StopIteration
try:
return self.n
finally:
self.n += 1
def g():
for i in range(4):
yield i
def test(it):
print(f'type(it) = {type(it)}')
print(f'isinstance(it, Generator) = {isinstance(it, Generator)}')
print(f'isinstance(it, Iterator) = {isinstance(it, Iterator)}')
print(f'isinstance(it, Iterable) = {isinstance(it, Iterable)}')
print(next(it))
print(next(it))
print(next(it))
print(next(it))
try:
print(next(it))
except StopIteration:
print('boom\n')
print(f'issubclass(Generator, Iterator) = {issubclass(Generator, Iterator)}')
print(f'issubclass(Iterator, Iterable) = {issubclass(Iterator, Iterable)}')
print()
test(IT())
test(g())
Output:
issubclass(Generator, Iterator) = True
issubclass(Iterator, Iterable) = True
type(it) = <class '__main__.IT'>
isinstance(it, Generator) = False
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom
type(it) = <class 'generator'>
isinstance(it, Generator) = True
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom
How do I check if my loop never ran at all?
This somehow looks too complicated to me:
x = _empty = object()
for x in data:
... # process x
if x is _empty:
raise ValueError("Empty data iterable: {!r:100}".format(data))
Ain't there a easier solution?
The above solution is from curiousefficiency.org
Update
data can contain None items.
data is an iterator, and I don't want to use it twice.
By "never ran", do you mean that data had no elements?
If so, the simplest solution is to check it before running the loop:
if not data:
raise Exception('Empty iterable')
for x in data:
...
However, as mentioned in the comments below, it will not work with some iterables, like files, generators, etc., so should be applied carefully.
The original code is best.
x = _empty = object()
_empty is called a sentinel value. In Python it's common to create a sentinel with object(), since it makes it obvious that the only purpose of _empty is to be a dummy value. But you could have used any mutable, for instance an empty list [].
Mutable objects are always guaranteed to be unique when you compare them with is, so you can safely use them as sentinel values, unlike immutables such as None or 0.
>>> None is None
True
>>> object() is object()
False
>>> [] is []
False
I propose the following:
loop_has_run = False
for x in data:
loop_has_run = True
... # process x
if not loop_has_run:
raise ValueError("Empty data iterable: {!r:100}".format(data))
I contend that this is better than the example in the question, because:
The intent is clearer (since the variable name specifies its meaning directly).
No objects are created or destroyed (which can have a negative performance impact).
It doesn't require paying attention to the subtle point that object() always returns a unique value.
Note that the loop_has_run = True assignment should be put at the start of the loop, in case (for example) the loop body contains break.
The following simple solution works with any iterable. It is based on the idea that we can check if there is a (first) element, and then keep iterating if there was one. The result is much clearer:
import itertools
try:
first_elmt = next(data)
except StopIteration:
raise ValueError("Empty data iterator: {!r:100}".format(data))
for x in itertools.chain([first_elmt], data):
…
PS: Note that it assumes that data is an iterator (as in the question). If it is merely an iterable, the code should be run on data_iter = iter(data) instead of on data (otherwise, say if data is a list, the loop would duplicate the first element).
The intent of that code isn't immediately obvious. Sure people would understand it after a while, but the code could be made clearer.
The solution I offer requires more lines of code, but that code is in a class that can be stored elsewhere. In addition this solution will work for iterables and iterators as well as sized containers.
Your code would be changed to:
it = HadItemsIterable(data)
for x in it:
...
if it.had_items:
...
The code for the class is as follows:
from collections.abc import Iterable
class HadItemsIterable(Iterable):
def __init__(self, iterable):
self._iterator = iter(iterable)
#property
def had_items(self):
try:
return self._had_items
except AttributeError as e:
raise ValueError("Not iterated over items yet")
def __iter__(self):
try:
first = next(self._iterator)
except StopIteration:
if hasattr(self, "_had_items"):
raise
self._had_items = False
raise
self._had_items = True
yield first
yield from self._iterator
You can add a loop_flag default as False, when loop executed, change it into True:
loop_flag = False
x = _empty = object()
for x in data:
loop_flag = True
... # process x
if loop_flag:
print "loop executed..."
What about this solution?
data=[]
count=None
for count, item in enumerate(data):
print (item)
if count is None:
raise ValueError('data is empty')
This question already has answers here:
Length of generator output [duplicate]
(9 answers)
What's the shortest way to count the number of items in a generator/iterator?
(7 answers)
Closed 9 years ago.
Python generators are very useful. They have advantages over functions that return lists. However, you could len(list_returning_function()). Is there a way to len(generator_function())?
UPDATE:
Of course len(list(generator_function())) would work.....
I'm trying to use a generator I've created inside a new generator I'm creating. As part of the calculation in the new generator it needs to know the length of the old one. However I would like to keep both of them together with the same properties as a generator, specifically - not maintain the entire list in memory as it may be very long.
UPDATE 2:
Assume the generator knows it's target length even from the first step. Also, there's no reason to maintain the len() syntax. Example - if functions in Python are objects, couldn't I assign the length to a variable of this object that would be accessible to the new generator?
The conversion to list that's been suggested in the other answers is the best way if you still want to process the generator elements afterwards, but has one flaw: It uses O(n) memory. You can count the elements in a generator without using that much memory with:
sum(1 for x in generator)
Of course, be aware that this might be slower than len(list(generator)) in common Python implementations, and if the generators are long enough for the memory complexity to matter, the operation would take quite some time. Still, I personally prefer this solution as it describes what I want to get, and it doesn't give me anything extra that's not required (such as a list of all the elements).
Also listen to delnan's advice: If you're discarding the output of the generator it is very likely that there is a way to calculate the number of elements without running it, or by counting them in another manner.
Generators have no length, they aren't collections after all.
Generators are functions with a internal state (and fancy syntax). You can repeatedly call them to get a sequence of values, so you can use them in loop. But they don't contain any elements, so asking for the length of a generator is like asking for the length of a function.
if functions in Python are objects, couldn't I assign the length to a
variable of this object that would be accessible to the new generator?
Functions are objects, but you cannot assign new attributes to them. The reason is probably to keep such a basic object as efficient as possible.
You can however simply return (generator, length) pairs from your functions or wrap the generator in a simple object like this:
class GeneratorLen(object):
def __init__(self, gen, length):
self.gen = gen
self.length = length
def __len__(self):
return self.length
def __iter__(self):
return self.gen
g = some_generator()
h = GeneratorLen(g, 1)
print len(h), list(h)
Suppose we have a generator:
def gen():
for i in range(10):
yield i
We can wrap the generator, along with the known length, in an object:
import itertools
class LenGen(object):
def __init__(self,gen,length):
self.gen=gen
self.length=length
def __call__(self):
return itertools.islice(self.gen(),self.length)
def __len__(self):
return self.length
lgen=LenGen(gen,10)
Instances of LenGen are generators themselves, since calling them returns an iterator.
Now we can use the lgen generator in place of gen, and access len(lgen) as well:
def new_gen():
for i in lgen():
yield float(i)/len(lgen)
for i in new_gen():
print(i)
You can use len(list(generator_function()). However, this consumes the generator, but that's the only way you can find out how many elements are generated. So you may want to save the list somewhere if you also want to use the items.
a = list(generator_function())
print(len(a))
print(a[0])
You can len(list(generator)) but you could probably make something more efficient if you really intend to discard the results.
You can use reduce.
For Python 3:
>>> import functools
>>> def gen():
... yield 1
... yield 2
... yield 3
...
>>> functools.reduce(lambda x,y: x + 1, gen(), 0)
In Python 2, reduce is in the global namespace so the import is unnecessary.
You can use send as a hack:
def counter():
length = 10
i = 0
while i < length:
val = (yield i)
if val == 'length':
yield length
i += 1
it = counter()
print(it.next())
#0
print(it.next())
#1
print(it.send('length'))
#10
print(it.next())
#2
print(it.next())
#3
You can combine the benefits of generators with the certainty of len(), by creating your own iterable object:
class MyIterable(object):
def __init__(self, n):
self.n = n
def __len__(self):
return self.n
def __iter__(self):
self._gen = self._generator()
return self
def _generator(self):
# Put your generator code here
i = 0
while i < self.n:
yield i
i += 1
def next(self):
return next(self._gen)
mi = MyIterable(100)
print len(mi)
for i in mi:
print i,
This is basically a simple implementation of xrange, which returns an object you can take the len of, but doesn't create an explicit list.