python generator of generators? - python

I wrote a class that reads a txt file. The file is composed of blocks of non-empty lines (let's call them "sections"), separated by an empty line:
line1.1
line1.2
line1.3
line2.1
line2.2
My first implementation was to read the whole file and return a list of lists, that is a list of sections, where each section is a list of lines.
This was obviously terrible memory-wise.
So I re-implemented it as a generator of lists, that is at every cycle my class reads a whole section in memory as a list and yields it.
This is better, but it's still problematic in case of large sections. So I wonder if I can reimplement it as a generator of generators? The problem is that this class is very generic, and it should be able to satisfy both of these use cases:
read a very big file, containing very big sections, and cycle through it only once. A generator of generators is perfect for this.
read a smallish file into memory to be cycled over multiple times. A generator of lists works fine, because the user can just invoke
list(MyClass(file_handle))
However, a generator of generators would NOT work in case 2, as the inner objects would not be transformed to lists.
Is there anything more elegant than implementing an explicit to_list() method, that would transform the generator of generators into a list of lists?

Python 2:
map(list, generator_of_generators)
Python 3:
list(map(list, generator_of_generators))
or for both:
[list(gen) for gen in generator_of_generators]
Since the generated objects are generator functions, not mere generators, you'd want to do
[list(gen()) for gen in generator_of_generator_functions]
If that doesn't work I have no idea what you're asking. Also, why would it return a generator function and not a generator itself?
Since in the comments you said you wanted to avoid list(generator_of_generator_functions) from crashing mysteriously, this depends on what you really want.
It is not possible to overwrite the behaviour of list in this way: either you store the sub-generator elements or not
If you really do get a crash, I recommend exhausting the sub-generator with the main generator loop every time the main generator iterates. This is standard practice and exactly what itertools.groupby does, a stdlib generator-of-generators.
eg.
def metagen():
def innergen():
yield 1
yield 2
yield 3
for i in range(3):
r = innergen()
yield r
for _ in r: pass
Or use a dark, secret hack method that I'll show in a mo' (I need to write it), but don't do it!
As promised, the hack (for Python 3, this time 'round):
from collections import UserList
from functools import partial
def objectitemcaller(key):
def inner(*args, **kwargs):
try:
return getattr(object, key)(*args, **kwargs)
except AttributeError:
return NotImplemented
return inner
class Listable(UserList):
def __init__(self, iterator):
self.iterator = iterator
self.iterated = False
def __iter__(self):
return self
def __next__(self):
self.iterated = True
return next(self.iterator)
def _to_list_hack(self):
self.data = list(self)
del self.iterated
del self.iterator
self.__class__ = UserList
for key in UserList.__dict__.keys() - Listable.__dict__.keys():
if key not in ["__class__", "__dict__", "__module__", "__subclasshook__"]:
setattr(Listable, key, objectitemcaller(key))
def metagen():
def innergen():
yield 1
yield 2
yield 3
for i in range(3):
r = Listable(innergen())
yield r
if not r.iterated:
r._to_list_hack()
else:
for item in r: pass
for item in metagen():
print(item)
print(list(item))
#>>> <Listable object at 0x7f46e4a4b850>
#>>> [1, 2, 3]
#>>> <Listable object at 0x7f46e4a4b950>
#>>> [1, 2, 3]
#>>> <Listable object at 0x7f46e4a4b990>
#>>> [1, 2, 3]
list(metagen())
#>>> [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
It's so bad I don't want to even explain it.
The key is that you have a wrapper that can detect whether it has been iterated, and if not you run a _to_list_hack that, I kid you not, changes the __class__ attribute.
Because of conflicting layouts we have to use the UserList class and shadow all of its methods, which is just another layer of crud.
Basically, please don't use this hack. You can enjoy it as humour, though.

A rather pragmatic way would be to tell the "generator of generators" upon creation whether to generate generators or lists. While this is not as convenient as having list magically know what to do, it still seems to be more comfortable than having a special to_list function.
def gengen(n, listmode=False):
for i in range(n):
def gen():
for k in range(i+1):
yield k
yield list(gen()) if listmode else gen()
Depending on the listmode parameter, this can either be used to generate generators or lists.
for gg in gengen(5, False):
print gg, list(gg)
print list(gengen(5, True))

Related

Applying different functions to generator depending on its length

Problem
I have an iterator spam to which I want to apply the function foo if it generates few items, and bar otherwise. With other words, I wish to translate the following code for iterables to generators:
if len(spam) <= max_size_for_foo:
foo(spam)
else:
bar(spam)
max_size_for_foo is comparably small so there is no problem if a list or other iterable of this length is created, but if spam is long, it must not be converted to a list (otherwise, memory problems ensue).
Dissatisfying solution
The best solution I could come up with so far is the following:
first_items = []
try:
for i in range(max_size_for_foo+1):
first_items.append(next(my_generator))
except StopIteration:
foo(first_items)
else:
my_generator = chain(first_items, my_generator)
bar(my_generator)
However, extracting a temporary list and chaining back into the generator feels rather dirty and inelegant to me.
Question
Is there a more elegant or Pythonesque way to do this?
The easiest way is probably to define your generator in a function so that it can be reused:
def spam_func():
return (i for i in [1, 2, 3])
spam_length = sum(1 for _ in spam_func())
if spam_length <= max_size_for_foo:
foo(spam_func())
else:
bar(spam_func())
There is no such thing as generator length, because generator easily can be infinite:
def square():
a = 0
while True:
yield a**2
a += 1
One solution is sum(1 for _ in gen).
Another one may be wrap your generator with a class something like this:
class wrapper:
def __init__(self, items):
self.items = items
def __len__(self):
return len(self.items)
def generate(self):
yeild from self.items

Difference between generator, iterator and iterator protocol [duplicate]

What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
iterator is a more general concept: any object whose class has a __next__ method (next in Python 2) and an __iter__ method that does return self.
Every generator is an iterator, but not vice versa. A generator is built by calling a function that has one or more yield expressions (yield statements, in Python 2.5 and earlier), and is an object that meets the previous paragraph's definition of an iterator.
You may want to use a custom iterator, rather than a generator, when you need a class with somewhat complex state-maintaining behavior, or want to expose other methods besides __next__ (and __iter__ and __init__). Most often, a generator (sometimes, for sufficiently simple needs, a generator expression) is sufficient, and it's simpler to code because state maintenance (within reasonable limits) is basically "done for you" by the frame getting suspended and resumed.
For example, a generator such as:
def squares(start, stop):
for i in range(start, stop):
yield i * i
generator = squares(a, b)
or the equivalent generator expression (genexp)
generator = (i*i for i in range(a, b))
would take more code to build as a custom iterator:
class Squares(object):
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self): return self
def __next__(self): # next in Python 2
if self.start >= self.stop:
raise StopIteration
current = self.start * self.start
self.start += 1
return current
iterator = Squares(a, b)
But, of course, with class Squares you could easily offer extra methods, i.e.
def current(self):
return self.start
if you have any actual need for such extra functionality in your application.
What is the difference between iterators and generators? Some examples for when you would use each case would be helpful.
In summary: Iterators are objects that have an __iter__ and a __next__ (next in Python 2) method. Generators provide an easy, built-in way to create instances of Iterators.
A function with yield in it is still a function, that, when called, returns an instance of a generator object:
def a_function():
"when called, returns generator object"
yield
A generator expression also returns a generator:
a_generator = (i for i in range(0))
For a more in-depth exposition and examples, keep reading.
A Generator is an Iterator
Specifically, generator is a subtype of iterator.
>>> import collections, types
>>> issubclass(types.GeneratorType, collections.Iterator)
True
We can create a generator several ways. A very common and simple way to do so is with a function.
Specifically, a function with yield in it is a function, that, when called, returns a generator:
>>> def a_function():
"just a function definition with yield in it"
yield
>>> type(a_function)
<class 'function'>
>>> a_generator = a_function() # when called
>>> type(a_generator) # returns a generator
<class 'generator'>
And a generator, again, is an Iterator:
>>> isinstance(a_generator, collections.Iterator)
True
An Iterator is an Iterable
An Iterator is an Iterable,
>>> issubclass(collections.Iterator, collections.Iterable)
True
which requires an __iter__ method that returns an Iterator:
>>> collections.Iterable()
Traceback (most recent call last):
File "<pyshell#79>", line 1, in <module>
collections.Iterable()
TypeError: Can't instantiate abstract class Iterable with abstract methods __iter__
Some examples of iterables are the built-in tuples, lists, dictionaries, sets, frozen sets, strings, byte strings, byte arrays, ranges and memoryviews:
>>> all(isinstance(element, collections.Iterable) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
Iterators require a next or __next__ method
In Python 2:
>>> collections.Iterator()
Traceback (most recent call last):
File "<pyshell#80>", line 1, in <module>
collections.Iterator()
TypeError: Can't instantiate abstract class Iterator with abstract methods next
And in Python 3:
>>> collections.Iterator()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: Can't instantiate abstract class Iterator with abstract methods __next__
We can get the iterators from the built-in objects (or custom objects) with the iter function:
>>> all(isinstance(iter(element), collections.Iterator) for element in (
(), [], {}, set(), frozenset(), '', b'', bytearray(), range(0), memoryview(b'')))
True
The __iter__ method is called when you attempt to use an object with a for-loop. Then the __next__ method is called on the iterator object to get each item out for the loop. The iterator raises StopIteration when you have exhausted it, and it cannot be reused at that point.
From the documentation
From the Generator Types section of the Iterator Types section of the Built-in Types documentation:
Python’s generators provide a convenient way to implement the iterator protocol. If a container object’s __iter__() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the __iter__() and next() [__next__() in Python 3] methods. More information about generators can be found in the documentation for the yield expression.
(Emphasis added.)
So from this we learn that Generators are a (convenient) type of Iterator.
Example Iterator Objects
You might create object that implements the Iterator protocol by creating or extending your own object.
class Yes(collections.Iterator):
def __init__(self, stop):
self.x = 0
self.stop = stop
def __iter__(self):
return self
def next(self):
if self.x < self.stop:
self.x += 1
return 'yes'
else:
# Iterators must raise when done, else considered broken
raise StopIteration
__next__ = next # Python 3 compatibility
But it's easier to simply use a Generator to do this:
def yes(stop):
for _ in range(stop):
yield 'yes'
Or perhaps simpler, a Generator Expression (works similarly to list comprehensions):
yes_expr = ('yes' for _ in range(stop))
They can all be used in the same way:
>>> stop = 4
>>> for i, y1, y2, y3 in zip(range(stop), Yes(stop), yes(stop),
('yes' for _ in range(stop))):
... print('{0}: {1} == {2} == {3}'.format(i, y1, y2, y3))
...
0: yes == yes == yes
1: yes == yes == yes
2: yes == yes == yes
3: yes == yes == yes
Conclusion
You can use the Iterator protocol directly when you need to extend a Python object as an object that can be iterated over.
However, in the vast majority of cases, you are best suited to use yield to define a function that returns a Generator Iterator or consider Generator Expressions.
Finally, note that generators provide even more functionality as coroutines. I explain Generators, along with the yield statement, in depth on my answer to "What does the “yield” keyword do?".
Iterators are objects which use the next() method to get the following values of a sequence.
Generators are functions that produce or yield a sequence of values using the yield keyword.
Every next() method call on a generator object(for ex: f below) returned by a generator function (for ex: foo() below), generates the next value in the sequence.
When a generator function is called, it returns an generator object without even beginning the execution of the function. When the next() method is called for the first time, the function starts executing until it reaches a yield statement which returns the yielded value. The yield keeps track of what has happened, i.e. it remembers the last execution. And secondly, the next() call continues from the previous value.
The following example demonstrates the interplay between yield and the call to the next method on a generator object.
>>> def foo():
... print("begin")
... for i in range(3):
... print("before yield", i)
... yield i
... print("after yield", i)
... print("end")
...
>>> f = foo()
>>> next(f)
begin
before yield 0 # Control is in for loop
0
>>> next(f)
after yield 0
before yield 1 # Continue for loop
1
>>> next(f)
after yield 1
before yield 2
2
>>> next(f)
after yield 2
end
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Adding an answer because none of the existing answers specifically address the confusion in the official literature.
Generator functions are ordinary functions defined using yield instead of return. When called, a generator function returns a generator object, which is a kind of iterator - it has a next() method. When you call next(), the next value yielded by the generator function is returned.
Either the function or the object may be called the "generator" depending on which Python source document you read. The Python glossary says generator functions, while the Python wiki implies generator objects. The Python tutorial remarkably manages to imply both usages in the space of three sentences:
Generators are a simple and powerful tool for creating iterators. They are written like regular functions but use the yield statement whenever they want to return data. Each time next() is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed).
The first two sentences identify generators with generator functions, while the third sentence identifies them with generator objects.
Despite all this confusion, one can seek out the Python language reference for the clear and final word:
The yield expression is only used when defining a generator function, and can only be used in the body of a function definition. Using a yield expression in a function definition is sufficient to cause that definition to create a generator function instead of a normal function.
When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of a generator function.
So, in formal and precise usage, "generator" unqualified means generator object, not generator function.
The above references are for Python 2 but Python 3 language reference says the same thing. However, the Python 3 glossary states that
generator ... Usually refers to a generator function, but may refer to a generator iterator in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity.
Everybody has a really nice and verbose answer with examples and I really appreciate it. I just wanted to give a short few lines answer for people who are still not quite clear conceptually:
If you create your own iterator, it is a little bit involved - you have
to create a class and at least implement the iter and the next methods. But what if you don't want to go through this hassle and want to quickly create an iterator. Fortunately, Python provides a short-cut way to defining an iterator. All you need to do is define a function with at least 1 call to yield and now when you call that function it will return "something" which will act like an iterator (you can call next method and use it in a for loop). This something has a name in Python called Generator
Hope that clarifies a bit.
Examples from Ned Batchelder highly recommended for iterators and generators
A method without generators that do something to even numbers
def evens(stream):
them = []
for n in stream:
if n % 2 == 0:
them.append(n)
return them
while by using a generator
def evens(stream):
for n in stream:
if n % 2 == 0:
yield n
We don't need any list nor a return statement
Efficient for large/ infinite length stream ... it just walks and yield the value
Calling the evens method (generator) is as usual
num = [...]
for n in evens(num):
do_smth(n)
Generator also used to Break double loop
Iterator
A book full of pages is an iterable, A bookmark is an
iterator
and this bookmark has nothing to do except to move next
litr = iter([1,2,3])
next(litr) ## 1
next(litr) ## 2
next(litr) ## 3
next(litr) ## StopIteration (Exception) as we got end of the iterator
To use Generator ... we need a function
To use Iterator ... we need next and iter
As been said:
A Generator function returns an iterator object
The Whole benefit of Iterator:
Store one element a time in memory
No-code 4 line cheat sheet:
A generator function is a function with yield in it.
A generator expression is like a list comprehension. It uses "()" vs "[]"
A generator object (often called 'a generator') is returned by both above.
A generator is also a subtype of iterator.
Previous answers missed this addition: a generator has a close method, while typical iterators don’t. The close method triggers a StopIteration exception in the generator, which may be caught in a finally clause in that iterator, to get a chance to run some clean‑up. This abstraction makes it most usable in the large than simple iterators. One can close a generator as one could close a file, without having to bother about what’s underneath.
That said, my personal answer to the first question would be: iteratable has an __iter__ method only, typical iterators have a __next__ method only, generators has both an __iter__ and a __next__ and an additional close.
For the second question, my personal answer would be: in a public interface, I tend to favor generators a lot, since it’s more resilient: the close method an a greater composability with yield from. Locally, I may use iterators, but only if it’s a flat and simple structure (iterators does not compose easily) and if there are reasons to believe the sequence is rather short especially if it may be stopped before it reach the end. I tend to look at iterators as a low level primitive, except as literals.
For control flow matters, generators are an as much important concept as promises: both are abstract and composable.
It's difficult to answer the question without 2 other concepts: iterable and iterator protocol.
What is difference between iterator and iterable?
Conceptually you iterate over iterable with the help of corresponding iterator. There are a few differences that can help to distinguish iterator and iterable in practice:
One difference is that iterator has __next__ method, iterable does not.
Another difference - both of them contain __iter__ method. In case of iterable it returns the corresponding iterator. In case of iterator it returns itself.
This can help to distinguish iterator and iterable in practice.
>>> x = [1, 2, 3]
>>> dir(x)
[... __iter__ ...]
>>> x_iter = iter(x)
>>> dir(x_iter)
[... __iter__ ... __next__ ...]
>>> type(x_iter)
list_iterator
What are iterables in python? list, string, range etc. What are iterators? enumerate, zip, reversed etc. We may check this using the approach above. It's kind of confusing. Probably it would be easier if we have only one type. Is there any difference between range and zip? One of the reasons to do this - range has a lot of additional functionality - we may index it or check if it contains some number etc. (see details here).
How can we create an iterator ourselves? Theoretically we may implement Iterator Protocol (see here). We need to write __next__ and __iter__ methods and raise StopIteration exception and so on (see Alex Martelli's answer for an example and possible motivation, see also here). But in practice we use generators. It seems to be by far the main method to create iterators in python.
I can give you a few more interesting examples that show somewhat confusing usage of those concepts in practice:
in keras we have tf.keras.preprocessing.image.ImageDataGenerator; this class doesn't have __next__ and __iter__ methods; so it's not an iterator (or generator);
if you call its flow_from_dataframe() method you'll get DataFrameIterator that has those methods; but it doesn't implement StopIteration (which is not common in build-in iterators in python); in documentation we may read that "A DataFrameIterator yielding tuples of (x, y)" - again confusing usage of terminology;
we also have Sequence class in keras and that's custom implementation of a generator functionality (regular generators are not suitable for multithreading) but it doesn't implement __next__ and __iter__, rather it's a wrapper around generators (it uses yield statement);
Generator Function, Generator Object, Generator:
A Generator function is just like a regular function in Python but it contains one or more yield statements. Generator functions is a great tool to create Iterator objects as easy as possible. The Iterator object returend by generator function is also called Generator object or Generator.
In this example I have created a Generator function which returns a Generator object <generator object fib at 0x01342480>. Just like other iterators, Generator objects can be used in a for loop or with the built-in function next() which returns the next value from generator.
def fib(max):
a, b = 0, 1
for i in range(max):
yield a
a, b = b, a + b
print(fib(10)) #<generator object fib at 0x01342480>
for i in fib(10):
print(i) # 0 1 1 2 3 5 8 13 21 34
print(next(myfib)) #0
print(next(myfib)) #1
print(next(myfib)) #1
print(next(myfib)) #2
So a generator function is the easiest way to create an Iterator object.
Iterator:
Every generator object is an iterator but not vice versa. A custom iterator object can be created if its class implements __iter__ and __next__ method (also called iterator protocol).
However, it is much easier to use generators function to create iterators because they simplify their creation, but a custom Iterator gives you more freedom and you can also implement other methods according to your requirements as shown in the below example.
class Fib:
def __init__(self,max):
self.current=0
self.next=1
self.max=max
self.count=0
def __iter__(self):
return self
def __next__(self):
if self.count>self.max:
raise StopIteration
else:
self.current,self.next=self.next,(self.current+self.next)
self.count+=1
return self.next-self.current
def __str__(self):
return "Generator object"
itobj=Fib(4)
print(itobj) #Generator object
for i in Fib(4):
print(i) #0 1 1 2
print(next(itobj)) #0
print(next(itobj)) #1
print(next(itobj)) #1
This thread covers in many details all the differences between the two, but wanted to add something on the conceptual difference between the two:
[...] an iterator as defined in the GoF book retrieves items from a collection, while a generator can produce items “out of thin air”. That’s why the Fibonacci sequence generator is a common example: an infinite series of numbers cannot be stored in a collection.
Ramalho, Luciano. Fluent Python (p. 415). O'Reilly Media. Kindle Edition.
Sure, it does not cover all the aspects but I think it gives a good notion when one can be useful.
You can compare both approaches for the same data:
def myGeneratorList(n):
for i in range(n):
yield i
def myIterableList(n):
ll = n*[None]
for i in range(n):
ll[i] = i
return ll
# Same values
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
for i1, i2 in zip(ll1, ll2):
print("{} {}".format(i1, i2))
# Generator can only be read once
ll1 = myGeneratorList(10)
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
# Generator can be read several times if converted into iterable
ll1 = list(myGeneratorList(10))
ll2 = myIterableList(10)
print("{} {}".format(len(list(ll1)), len(ll2)))
print("{} {}".format(len(list(ll1)), len(ll2)))
Besides, if you check the memory footprint, the generator takes much less memory as it doesn't need to store all the values in memory at the same time.
An iterable object is something which can be iterated (naturally). To do that, however, you will need something like an iterator object, and, yes, the terminology may be confusing. Iterable objects include a __iter__ method which will return the iterator object for the iterable object.
An iterator object is an object which implements the iterator protocol - a set of rules. In this case, it must have at least these two methods: __iter__ and __next__. The __next__ method is a function which supplies a new value. The __iter__ method returns the iterator object. In a more complex object, there may be a separate iterator, but in a simpler case, __iter__ returns the object itself (typically return self).
One iterable object is a list object. It’s not an iterator, but it has an __iter__ method which returns an iterator. You can call this method directly as things.__iter__(), or use iter(things).
If you want to iterate through any collection, you will need to use its iterator:
things_iterator = iter(things)
for i in things_iterator:
print(i)
However, Python will automatically use the iterator, which is why you never see the above example. Instead you write:
for i in things:
print(i)
Writing an iterator yourself can be tedious, so Python has a simpler alternative: the generator function. A generator function is not an ordinary function. Instead of running through the code and returning a final result, the code is deferred, and the function returns immediately with a generator object.
A generator object is like an iterator object in that it implements the iterator protocol. That’s good enough for most purposes. There are many examples of generators in the other answers.
In short, an iterator is an object which allows you to iterate through another object, whether it’s a collection or some other source of values. A generator is a simplified iterator which does more-or-less the same job, but is easier to implement.
Normally, you would go for a generator if that’s all you need. If, however, you’re building a more complex object which includes iteration among other features, you would use the iterator protocol instead.
I am writing specifically for Python newbies in a very simple way, though deep down Python does so many things.
Let’s start with the very basic:
Consider a list,
l = [1,2,3]
Let’s write an equivalent function:
def f():
return [1,2,3]
o/p of print(l): [1,2,3] &
o/p of print(f()) : [1,2,3]
Let’s make list l iterable: In python list is always iterable that means you can apply iterator whenever you want.
Let’s apply iterator on list:
iter_l = iter(l) # iterator applied explicitly
Let’s make a function iterable, i.e. write an equivalent generator function.
In python as soon as you introduce the keyword yield; it becomes a generator function and iterator will be applied implicitly.
Note: Every generator is always iterable with implicit iterator applied and here implicit iterator is the crux
So the generator function will be:
def f():
yield 1
yield 2
yield 3
iter_f = f() # which is iter(f) as iterator is already applied implicitly
So if you have observed, as soon as you made function f a generator, it is already iter(f)
Now,
l is the list, after applying iterator method "iter" it becomes,
iter(l)
f is already iter(f), after applying iterator method "iter" it
becomes, iter(iter(f)), which is again iter(f)
It's kinda you are casting int to int(x) which is already int and it will remain int(x).
For example o/p of :
print(type(iter(iter(l))))
is
<class 'list_iterator'>
Never forget this is Python and not C or C++
Hence the conclusion from above explanation is:
list l ~= iter(l)
generator function f == iter(f)
All generators are iterators but not vice versa.
from typing import Iterator
from typing import Iterable
from typing import Generator
class IT:
def __init__(self):
self.n = 0
def __iter__(self):
return self
def __next__(self):
if self.n == 4:
raise StopIteration
try:
return self.n
finally:
self.n += 1
def g():
for i in range(4):
yield i
def test(it):
print(f'type(it) = {type(it)}')
print(f'isinstance(it, Generator) = {isinstance(it, Generator)}')
print(f'isinstance(it, Iterator) = {isinstance(it, Iterator)}')
print(f'isinstance(it, Iterable) = {isinstance(it, Iterable)}')
print(next(it))
print(next(it))
print(next(it))
print(next(it))
try:
print(next(it))
except StopIteration:
print('boom\n')
print(f'issubclass(Generator, Iterator) = {issubclass(Generator, Iterator)}')
print(f'issubclass(Iterator, Iterable) = {issubclass(Iterator, Iterable)}')
print()
test(IT())
test(g())
Output:
issubclass(Generator, Iterator) = True
issubclass(Iterator, Iterable) = True
type(it) = <class '__main__.IT'>
isinstance(it, Generator) = False
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom
type(it) = <class 'generator'>
isinstance(it, Generator) = True
isinstance(it, Iterator) = True
isinstance(it, Iterable) = True
0
1
2
3
boom

Efficient list manipulation in python

I have a large list and regularily need to find an item satisfying a rather complex condition (not equality), i.e. I am forced to check every item in the list until I find one. The conditions change, but some items match more often then others. So I would like to bring the matching item to the front of the list each time I find one, so frequently matching items are found more quickly.
Is there an efficient, pythonic way to do this?
Sequences ([]) are backed by an array, so removing an item somewhere in the middle and prepending it to the array means moving every previous item. That's in O(n) time, not good.
In C you could build a linked list and move the item on your own when found. In Python there is a deque, but afaik you cannot reference the node objects nor have access to .next pointers.
And a self-made linked list is very slow in Python. (In fact it's slower than ordinary linear search without moving any item.)
Sadly, a dict or set finds items based on value equality and thus doesn't fit my problem.
As an illustration, here's the condition:
u, v, w = n.value # list item
if v in g[u] and w in g[v] and u not in g[w]:
...
Consider instead a Pythonic approach. As Ed Post once put it, "The determined Real Programmer can write FORTRAN programs in any language" -- and this generalizes... you're trying to write C in Python and it isn't working well for you:-)
Rather, think of putting an auxiliary dict cache next to the list -- caching the indices where items are found (needs to be invalidated only on "deep" changes to the list's structure). Much simpler and faster...
Probably best done by having list and dict in a small class:
class Seeker(object):
def __init__(self, *a, **k):
self.l = list(*a, **k)
self.d = {}
def find(self, value):
where = self.d.get(value)
if where is None:
self.d[value] = where = self.l.find(value)
return where
def __setitem__(self, index, value):
if value in self.d: del self.d[value]
self.l[index] = value
# and so on for other mutators that invalidate self.d; then,
def __getattr__(self, name):
# delegate everything else to the list
return getattr(self.l, name)
You need only define the mutators you actually need to use -- e.g, if you won't do insert, sort, __delitem__, &c, no need to define those, you can just delegate them to the list.
Added: in Python 3.2 or better, functools.lru_cache can actually do most of the work for you -- use it to decorate find and you'll get a better implementation of caching, with the ability to limit cache size if you so desire. To clear the cache, you'll need to call self.find.cache_clear() at the appropriate spots (where I above use self.d = {}) -- unfortunately, that crucial functionality is not (yet!-) documented (the volunteers updating the docs are not the same ones updating the code...!-)... but, trust me, it's not going to disappear on you:-).
Added: the OP edited the Q to clarify that he's not after "value equality", but rather some more complex set of conditions, exemplified by a predicate such as:
def good_for_g(g, n):
# for some container `g` and item value `n`:
u, v, w = n.value
return v in g[u] and w in g[v] and u not in g[w]
Presumably, then, the desire to bring "good" items towards the front is in turn predicated on their "goodness" being "sticky", i.e, g staying pretty much the same for a while. In this case, one can use the predicate one as a feature extraction and checking function, which forms the key into the dictionary -- so for example:
class FancySeeker(object):
def __init__(self, *a, **k):
self.l = list(*a, **k)
self.d = {}
def _find_in_list(self, predicate):
for i, n in enumerate(self.l):
if predicate(n):
return i
return -1
def find(self, predicate):
where = self.d.get(predicate)
if where is None:
where = self._find_in_list(predicate)
self.d[predicate] = where
return where
and so forth.
So the remaining difficulty is to put predicate in a form suitable for effective indexing into a dict. If predicate is just a function, no problem. But if predicate is a function with parameters, as formed e.g by functools.partial or as a bound method of some instance, that requires a bit of further processing/wrapping to make the indexing work.
Two calls to functools.partial with the same bound argument(s) and function, for example, do not return equal objects -- one has, rather, to inspect the .args and .func of the returned objects to ensure, so to speak, a "singleton" is returned for any given (func, args) pair.
Moreover, if some of the bound arguments are mutable, one needs to use their id in lieu of their hash (or else the raw functools.partial object would not be hashable). It gets even hairier for bound methods, though they can similarly be wrapped into e.g a hashable, "equality adjusted" Predicate class.
Lastly, if these gyrations prove too cumbersome and you really want a fast implementation of a linked list instead, look at https://pypi.python.org/pypi/llist/0.4 -- it's a C-coded implementation of singly and doubly linked lists for Python (for each kind, it implements three types: the list itself, the list node, and the list's iterator).
You can do exactly what you want using deque.rotate.
from collections import deque
class Collection:
"Linked List collection that moves searched for items to the front of the collection"
def __init__(self, seq):
self._deque = deque(seq)
def __contains__(self, target):
for i, item in enumerate(self._deque):
if item == target:
self._deque.rotate(i)
self._deque.popleft()
self._deque.rotate(-i+1)
self._deque.appendleft(item)
return True
return False
def __str__(self):
return "Collection({})".format(str(self._deque))
c = Collection(range(10))
print(c)
print("5 in d:", 5 in c)
print(c)
Gives the following output:
Collection(deque([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]))
5 in c: True
Collection(deque([5, 0, 1, 2, 3, 4, 6, 7, 8, 9]))

restartable generator available in Python Standard Library?

I can't imagine that I'm the first to write a class like this:
class RestartableGenerator:
def __init__(self, g):
self.g = g
def __iter__(self):
return self.g().__iter__()
if __name__=='__main__':
def gen():
print 'Generating'
for i in range(5):
yield i
i = RestartableGenerator(gen)
print 'Using'
print list(i)
print list(i)
The test produces this output:
Using
Generating
[0, 1, 2, 3, 4]
Generating
[0, 1, 2, 3, 4]
But I didn't find it in the Standard Library. I looked in itertools and functools.
Is it really not there? If it is, where?
Was it considered unnecessary, because when you want to evaluate a sequence multiple times, you better store it in a list?
Edit 1:
My use case is that I want it to be tranparent for the consumer that the sequence is, for memory consumption reasons, a generator instead of a list.
Edit 2: If there's no such class in the Standard Library, what name do you think is appropriate? ParenthesisRemover? MultipleTimesIterable? Anything else? Why?
You could simplify it a little:
class RestartableGenerator:
def __init__(self, g):
self.g = g
def __iter__(self):
return self.g()
Calling gen() returns a generator object. The generator object has a next method, which is the kind of object which __iter__ must return.
There is no need for RestartableGenerator, however, since it does nothing that gen itself can not do. Instead of holding gen in a class attribute, just hold on to gen itself.
def gen():
print 'Generating'
for i in range(5):
yield i
print 'Using'
print list(gen())
print list(gen())
This isn't exactly restartable in the sense of going back to the beginning of the sequence. Each __iter__ call creates a new generator that will rerun the generator code, potentially reexecuting side effects and producing different results. If you want independent iterators over a generated sequence, that's what list or itertools.tee are for. Otherwise, it's clearer to explicitly call the generator function again, so this isn't very useful. You save a pair of parentheses at the cost of less explicit, more bug-prone code.
Note that if you want a lazy sequence type, where iterating over it generates elements on the fly, but you can iterate over it repeatedly, you should define its __iter__ method as a generator:
class Primes(object):
def __iter__(self):
for i in itertools.count():
if is_prime(i):
yield i
This isn't a "restartable generator", but it sounds like what you want.

How to len(generator()) [duplicate]

This question already has answers here:
Length of generator output [duplicate]
(9 answers)
What's the shortest way to count the number of items in a generator/iterator?
(7 answers)
Closed 9 years ago.
Python generators are very useful. They have advantages over functions that return lists. However, you could len(list_returning_function()). Is there a way to len(generator_function())?
UPDATE:
Of course len(list(generator_function())) would work.....
I'm trying to use a generator I've created inside a new generator I'm creating. As part of the calculation in the new generator it needs to know the length of the old one. However I would like to keep both of them together with the same properties as a generator, specifically - not maintain the entire list in memory as it may be very long.
UPDATE 2:
Assume the generator knows it's target length even from the first step. Also, there's no reason to maintain the len() syntax. Example - if functions in Python are objects, couldn't I assign the length to a variable of this object that would be accessible to the new generator?
The conversion to list that's been suggested in the other answers is the best way if you still want to process the generator elements afterwards, but has one flaw: It uses O(n) memory. You can count the elements in a generator without using that much memory with:
sum(1 for x in generator)
Of course, be aware that this might be slower than len(list(generator)) in common Python implementations, and if the generators are long enough for the memory complexity to matter, the operation would take quite some time. Still, I personally prefer this solution as it describes what I want to get, and it doesn't give me anything extra that's not required (such as a list of all the elements).
Also listen to delnan's advice: If you're discarding the output of the generator it is very likely that there is a way to calculate the number of elements without running it, or by counting them in another manner.
Generators have no length, they aren't collections after all.
Generators are functions with a internal state (and fancy syntax). You can repeatedly call them to get a sequence of values, so you can use them in loop. But they don't contain any elements, so asking for the length of a generator is like asking for the length of a function.
if functions in Python are objects, couldn't I assign the length to a
variable of this object that would be accessible to the new generator?
Functions are objects, but you cannot assign new attributes to them. The reason is probably to keep such a basic object as efficient as possible.
You can however simply return (generator, length) pairs from your functions or wrap the generator in a simple object like this:
class GeneratorLen(object):
def __init__(self, gen, length):
self.gen = gen
self.length = length
def __len__(self):
return self.length
def __iter__(self):
return self.gen
g = some_generator()
h = GeneratorLen(g, 1)
print len(h), list(h)
Suppose we have a generator:
def gen():
for i in range(10):
yield i
We can wrap the generator, along with the known length, in an object:
import itertools
class LenGen(object):
def __init__(self,gen,length):
self.gen=gen
self.length=length
def __call__(self):
return itertools.islice(self.gen(),self.length)
def __len__(self):
return self.length
lgen=LenGen(gen,10)
Instances of LenGen are generators themselves, since calling them returns an iterator.
Now we can use the lgen generator in place of gen, and access len(lgen) as well:
def new_gen():
for i in lgen():
yield float(i)/len(lgen)
for i in new_gen():
print(i)
You can use len(list(generator_function()). However, this consumes the generator, but that's the only way you can find out how many elements are generated. So you may want to save the list somewhere if you also want to use the items.
a = list(generator_function())
print(len(a))
print(a[0])
You can len(list(generator)) but you could probably make something more efficient if you really intend to discard the results.
You can use reduce.
For Python 3:
>>> import functools
>>> def gen():
... yield 1
... yield 2
... yield 3
...
>>> functools.reduce(lambda x,y: x + 1, gen(), 0)
In Python 2, reduce is in the global namespace so the import is unnecessary.
You can use send as a hack:
def counter():
length = 10
i = 0
while i < length:
val = (yield i)
if val == 'length':
yield length
i += 1
it = counter()
print(it.next())
#0
print(it.next())
#1
print(it.send('length'))
#10
print(it.next())
#2
print(it.next())
#3
You can combine the benefits of generators with the certainty of len(), by creating your own iterable object:
class MyIterable(object):
def __init__(self, n):
self.n = n
def __len__(self):
return self.n
def __iter__(self):
self._gen = self._generator()
return self
def _generator(self):
# Put your generator code here
i = 0
while i < self.n:
yield i
i += 1
def next(self):
return next(self._gen)
mi = MyIterable(100)
print len(mi)
for i in mi:
print i,
This is basically a simple implementation of xrange, which returns an object you can take the len of, but doesn't create an explicit list.

Categories

Resources