In a python iterator select other iterator based on conditions - python

In python I have an iterator returning an infinite string of indices in a fixed range [0, N] called Sampler. Actually I have a list of those and all they do is return indices in the range [0, N_0], [N_0, N_1], ..., [N_{n-1}, N_n].
What I now want to do is first select one of these iterators based on the length of their range, so I have a weights list [N_0, N_1 - N_0, ...] and I select one of these with:
iterator_idx = random.choices(range(len(weights)), weights=weights/weights.sum())[0]
Next, what I want to do is create an iterator which randomly selects one of the iterators and selects a batch of M samples.
class BatchSampler:
def __init__(self, M):
self.M = M
self.weights = [weight_list]
self.samplers = [list_of_iterators]
]
self._batch_samplers = [
self.batch_sampler(sampler) for sampler in self.samplers
]
def batch_sampler(self, sampler):
batch = []
for batch_idx in sampler:
batch.append(batch_idx)
if len(batch) == self.M:
yield batch
if len(batch) > 0:
yield batch
def __iter__(self):
# First select one of the datasets.
iterator_idx = random.choices(
range(len(self.weights)), weights=self.weights / self.weights.sum()
)[0]
return self._batch_samplers[iterator_idx]
The issue with this is that iter() only seems to be called once, so only the first time iterator_idx is selected. Obviously this is wrong... What is the way around this?
This is a possible case when you would have multiple datasets in pytorch, but you want to sample only batches from one of the datasets.

Seems to me that you want to define your own container type.
I'll try to provide examples of a few standard ways to do so
(hopefully without missing too many details);
you should be able to reuse one of these simple examples,
into your own class.
Using just __getitem__ (support indexing & looping):
object.__getitem__
Called to implement evaluation of self[key].
class MyContainer:
def __init__(self, sequence):
self.elements = sequence # Just something to work with.
def __getitem__(self, key):
# If we're delegating to sequences like built-in list,
# invalid indices are handled automatically by them
# (throwing IndexError, as per the documentation).
return self.elements[key]
t = (1, 2, 'a', 'b')
c = MyContainer(t)
elems = [e for e in c]
assert elems == [1, 2, 'a', 'b']
assert c[1:-1] == t[1:-1] == (2, 'a')
Using the iterator protocol:
object.__iter__
object.__iter__(self)
This method is called when an iterator is required for a container. This method should return a new iterator object that can iterate over all the objects in the container. For mappings, it should iterate over the keys of the container.
Iterator objects also need to implement this method; they are required to return themselves. For more information on iterator objects, see Iterator Types.
Iterator Types
container.__iter__()
Return an iterator object. The object is required to support the iterator protocol described below.
The iterator objects themselves are required to support the following two methods, which together form the iterator protocol:
iterator.__iter__()
Return the iterator object itself. This is required to allow both containers and iterators to be used with the for and in statements.
iterator.__next__()
Return the next item from the container. If there are no further items, raise the StopIteration exception.
Once an iterator's __next__() method raises StopIteration, it must continue to do so on subsequent calls.
class MyContainer:
class Iter:
def __init__(self, container):
self.cont = container
self.pos = 0
self.len = len(container.elements)
def __iter__(self): return self
def __next__(self):
if self.pos == self.len: raise StopIteration
curElem = self.cont.elements[self.pos]
self.pos += 1
return curElem
def __init__(self, sequence):
self.elements = sequence # Just something to work with.
def __iter__(self):
return MyContainer.Iter(self)
t = (1, 2, 'a', 'b')
c = MyContainer(t)
elems = [e for e in c]
assert elems == [1, 2, 'a', 'b']
Using a generator:
Generator Types
Python's generators provide a convenient way to implement the iterator protocol. If a container object's iter() method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the iter() and next() methods.
generator
A function which returns a generator iterator. It looks like a normal function except that it contains yield expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the next() function.
Usually refers to a generator function, but may refer to a generator iterator in some contexts.
generator iterator
An object created by a generator function.
6.2.9. Yield expressions
Using a yield expression in a function's body causes that function to be a generator
class MyContainer:
def __init__(self, sequence):
self.elements = sequence # Just something to work with.
def __iter__(self):
for e in self.elements: yield e
t = (1, 2, 'a', 'b')
c = MyContainer(t)
elems = [e for e in c]
assert elems == [1, 2, 'a', 'b']

Related

If iterator is made on some structure to iterate over it using next function

For e.g. iterator is created like this:
Iter1=iter(list).
Now this iterator object is created over list and next function will be called
filter() function returns an iterator but this iterator is created over which underlying structure to be iterated upon. The iterator is attached to which kind of data structure in case of filter() function that we used. To which structure iterator is attached to in case of filter() function?
filter is a type that implements the iterator protocol. To create an instance of filter, you give it a predicate (or None to indicate the constant predicate lambda x: True) and an iterable. An iterator for the iterable value is created and stored along with the predicate.
When iterating over the filter instance, it yields only those values from its internal iterator that match the predicate. A pure Python implementation might look like
class filter:
def __init__(self, p, itr):
self.itr = iter(itr)
self.p = p if p is not None else lambda x: True
def __iter__(self):
return self
def __next__(self):
while True:
x = next(self.itr)
if self.pred(x):
return x
The call to iter in filter.__init__ is how you can filter an iterable (like a list) that is not itself an iterator.
Note that while all filter instances wrap some other iterator, not all iterators wrap another data structure. For example, the following class also implements the iterator protocol to produce an infinite stream of 1s.
class Ones:
def __iter__(self):
return self
def __next__(self):
return 1

Iterators on sequences

I want to ask how is the iterator in python designed? I have read that when the iterator is used it returns single values of the sequence of an iterable it represents upon the call of next or next(). And this way it does not need to copy the full iterable and therefore occupy memory. Exactly how can you return single values of a sequence object like list, string or mapping objects like dictionaries? Does it store pointers to the original sequence datastructure contents and then have a method called next which increments the pointer?
Thanks
In Python 3, however, zip() returns an iterator. This object yields
tuples on demand and can be traversed only once. The iteration ends
with a StopIteration exception once the shortest input iterable is
exhausted. If you supply no arguments to zip(), then the function
returns an empty iterator:
An iterator is just a class that defines __next__. That's it (though all iterators should also be iterable by defining __iter__ = lambda self: self as well). It's entirely how __next__ is defined that defines the behavior of the iterator.
You can create an iterator over a constant sequence
class Ones:
def __iter__(self):
return self
def __next__(self):
return 1
or you can create an iterator that walks through some iterable value. It's the iterator that keeps track of which value to return next, from the values supplied by the iterable. Here's a simplified version of list_iterator, which is the built-in type of which list.__iter__ return values.
class list_iterator:
def __init__(self, lst):
self.lst = lst
self.i = 0
def __iter__(self):
return self
def __next__(self):
if self.i == len(self.lst):
raise StopIteration
x = self.lst[self.i]
self.i += 1
return x
x = [1,2,3]
# prints 1, then 2, then 3
for v in list_iterator(x):
print(v)

Second argument of the iter() method

I'm trying to figure out how to make iterator, below is an iterator that works fine.
class DoubleIt:
def __init__(self):
self.start = 1
def __iter__(self):
self.max = 10
return self
def __next__(self):
if self.start < self.max:
self.start *= 2
return self.start
else:
raise StopIteration
obj = DoubleIt()
i = iter(obj)
print(next(i))
However, when I try to pass 16 into the second argument in iter() (I expect the iterator will stop when return 16)
i = iter(DoubleIt(), 16)
print(next(i))
It throws TypeError: iter(v, w): v must be callable
Therefore, I try to do so.
i = iter(DoubleIt, 16)
print(next(i))
It returns <main.DoubleIt object at 0x7f4dcd4459e8>. Which is not I expected.
I checked the website of programiz, https://www.programiz.com/python-programming/methods/built-in/iter
Which said that callable object must be passed in the first argument so as to use the second argument, but it doesn't mention can User defined object be passed in it in order to use the second argument.
So my question is, is there a way to do so? Can the second argument be used with the "Self defined Object"?
The documentation could be a bit clearer on this, it only states
iter(object[, sentinel])
...
The iterator created in this case will call object with no arguments
for each call to its __next__() method; if the value returned is equal to sentinel, StopIteration will be raised, otherwise the value will be returned.
What is maybe not said perfectly clearly is that what the iterator yields is whatever the callable returns. And since your callable is a class (with no arguments), it returns a new instance of the class every iteration.
One way around this is to make your class callable and delegate it to the __next__ method:
class DoubleIt:
def __init__(self):
self.start = 1
def __iter__(self):
return self
def __next__(self):
self.start *= 2
return self.start
__call__ = __next__
i = iter(DoubleIt(), 16)
print(next(i))
# 2
print(list(i))
# [4, 8]
This has the dis-/advantage that it is an infinite generator that is only stopped by the sentinel value of iter.
Another way is to make the maximum an argument of the class:
class DoubleIt:
def __init__(self, max=10):
self.start = 1
self.max = max
def __iter__(self):
return self
def __next__(self):
if self.start < self.max:
self.start *= 2
return self.start
else:
raise StopIteration
i = iter(DoubleIt(max=16))
print(next(i))
# 2
print(list(i))
# [4, 8, 16]
One difference to note is that iter stops when it encounters the sentinel value (and does not yield the item), whereas this second way uses <, instead of <= comparison (like your code) and will thus yield the maximum item.
Here's an example of a doubler routine that would work with the two argument mode of iter:
count = 1
def nextcount():
global count
count *= 2
return count
print(list(iter(nextcount, 16)))
# Produces [2, 4, 8]
This mode involves iter creating the iterator for us. Note that we need to reset count before it can work again; it only works given a callable (such as a function or bound method) that has side effects (changing the counter), and the iterator will only stop upon encountering exactly the sentinel value.
Your DoubleIt class provided no particular protocol for setting a max value, and iter doesn't expect or use any such protocol either. The alternate mode of iter creates an iterator from a callable and a sentinel value, quite independent of the iterable or iterator protocols.
The behaviour you expected is more akin to what itertools.takewhile or itertools.islice do, manipulating one iterator to create another.
Another way to make an iterable object is to implement the sequence protocol:
class DoubleSeq:
def __init__(self, steps):
self.steps = steps
def __len__(self):
return self.steps
def __getitem__(self, iteration):
if iteration >= self.steps:
raise IndexError()
return 2**iteration
print(list(iter(DoubleSeq(4))))
# Produces [1, 2, 4, 8]
Note that DoubleSeq isn't an iterator at all; iter created one for us using the sequence protocol. DoubleSeq doesn't hold the iteration counter, the iterator does.

How can I have multiple iterators over a single python iterable at the same time?

I would like to compare all elements in my iterable object combinatorically with each other. The following reproducible example just mimics the functionality of a plain list, but demonstrates my problem. In this example with a list of ["A","B","C","D"], I would like to get the following 16 lines of output, every combination of each item with each other. A list of 100 items should generate 100*100=10,000 lines.
A A True
A B False
A C False
... 10 more lines ...
D B False
D C False
D D True
The following code seemed like it should do the job.
class C():
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
self.idx = 0
return self
def __next__(self):
self.idx += 1
if self.idx > len(self.stuff):
raise StopIteration
else:
return self.stuff[self.idx - 1]
thing = C()
for x in thing:
for y in thing:
print(x, y, x==y)
But after finishing the y-loop, the x-loop seems done, too, even though it's only used the first item in the iterable.
A A True
A B False
A C False
A D False
After much searching, I eventually tried the following code, hoping that itertools.tee would allow me two independent iterators over the same data:
import itertools
thing = C()
thing_one, thing_two = itertools.tee(thing)
for x in thing_one:
for y in thing_two:
print(x, y, x==y)
But I got the same output as before.
The real-world object this represents is a model of a directory and file structure with varying numbers of files and subdirectories, at varying depths into the tree. It has nested links to thousands of members and iterates correctly over them once, just like this example. But it also does expensive processing within its many internal objects on-the-fly as needed for comparisons, which would end up doubling the workload if I had to make a complete copy of it prior to iterating. I would really like to use multiple iterators, pointing into a single object with all the data, if possible.
Edit on answers: The critical flaw in the question code, pointed out in all answers, is the single internal self.idx variable being unable to handle multiple callers independently. The accepted answer is the best for my real class (oversimplified in this reproducible example), another answer presents a simple, elegant solution for simpler data structures like the list presented here.
It's actually impossible to make a container class that is it's own iterator. The container shouldn't know about the state of the iterator and the iterator doesn't need to know the contents of the container, it just needs to know which object is the corresponding container and "where" it is. If you mix iterator and container different iterators will share state with each other (in your case the self.idx) which will not give the correct results (they read and modify the same variable).
That's the reason why all built-in types have a seperate iterator class (and even some have an reverse-iterator class):
>>> l = [1, 2, 3]
>>> iter(l)
<list_iterator at 0x15e360c86d8>
>>> reversed(l)
<list_reverseiterator at 0x15e360a5940>
>>> t = (1, 2, 3)
>>> iter(t)
<tuple_iterator at 0x15e363fb320>
>>> s = '123'
>>> iter(s)
<str_iterator at 0x15e363fb438>
So, basically you could just return iter(self.stuff) in __iter__ and drop the __next__ altogether because list_iterator knows how to iterate over the list:
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
return iter(self.stuff)
thing = C()
for x in thing:
for y in thing:
print(x, y, x==y)
prints 16 lines, like expected.
If your goal is to make your own iterator class, you need two classes (or 3 if you want to implement the reversed-iterator yourself).
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
return C_iterator(self)
def __reversed__(self):
return C_reversed_iterator(self)
class C_iterator:
def __init__(self, parent):
self.idx = 0
self.parent = parent
def __iter__(self):
return self
def __next__(self):
self.idx += 1
if self.idx > len(self.parent.stuff):
raise StopIteration
else:
return self.parent.stuff[self.idx - 1]
thing = C()
for x in thing:
for y in thing:
print(x, y, x==y)
works as well.
For completeness, here's one possible implementation of the reversed-iterator:
class C_reversed_iterator:
def __init__(self, parent):
self.parent = parent
self.idx = len(parent.stuff) + 1
def __iter__(self):
return self
def __next__(self):
self.idx -= 1
if self.idx <= 0:
raise StopIteration
else:
return self.parent.stuff[self.idx - 1]
thing = C()
for x in reversed(thing):
for y in reversed(thing):
print(x, y, x==y)
Instead of defining your own iterators you could use generators. One way was already shown in the other answer:
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
yield from self.stuff
def __reversed__(self):
yield from self.stuff[::-1]
or explicitly delegate to a generator function (that's actually equivalent to the above but maybe more clear that it's a new object that is produced):
def C_iterator(obj):
for item in obj.stuff:
yield item
def C_reverse_iterator(obj):
for item in obj.stuff[::-1]:
yield item
class C:
def __init__(self):
self.stuff = ["A","B","C","D"]
def __iter__(self):
return C_iterator(self)
def __reversed__(self):
return C_reverse_iterator(self)
Note: You don't have to implement the __reversed__ iterator. That was just meant as additional "feature" of the answer.
Your __iter__ is completely broken. Instead of actually making a fresh iterator on every call, it just resets some state on self and returns self. That means you can't actually have more than one iterator at a time over your object, and any call to __iter__ while another loop over the object is active will interfere with the existing loop.
You need to actually make a new object. The simplest way to do that is to use yield syntax to write a generator function. The generator function will automatically return a new iterator object every time:
class C(object):
def __init__(self):
self.stuff = ['A', 'B', 'C', 'D']
def __iter__(self):
for thing in self.stuff:
yield thing

python generator of generators?

I wrote a class that reads a txt file. The file is composed of blocks of non-empty lines (let's call them "sections"), separated by an empty line:
line1.1
line1.2
line1.3
line2.1
line2.2
My first implementation was to read the whole file and return a list of lists, that is a list of sections, where each section is a list of lines.
This was obviously terrible memory-wise.
So I re-implemented it as a generator of lists, that is at every cycle my class reads a whole section in memory as a list and yields it.
This is better, but it's still problematic in case of large sections. So I wonder if I can reimplement it as a generator of generators? The problem is that this class is very generic, and it should be able to satisfy both of these use cases:
read a very big file, containing very big sections, and cycle through it only once. A generator of generators is perfect for this.
read a smallish file into memory to be cycled over multiple times. A generator of lists works fine, because the user can just invoke
list(MyClass(file_handle))
However, a generator of generators would NOT work in case 2, as the inner objects would not be transformed to lists.
Is there anything more elegant than implementing an explicit to_list() method, that would transform the generator of generators into a list of lists?
Python 2:
map(list, generator_of_generators)
Python 3:
list(map(list, generator_of_generators))
or for both:
[list(gen) for gen in generator_of_generators]
Since the generated objects are generator functions, not mere generators, you'd want to do
[list(gen()) for gen in generator_of_generator_functions]
If that doesn't work I have no idea what you're asking. Also, why would it return a generator function and not a generator itself?
Since in the comments you said you wanted to avoid list(generator_of_generator_functions) from crashing mysteriously, this depends on what you really want.
It is not possible to overwrite the behaviour of list in this way: either you store the sub-generator elements or not
If you really do get a crash, I recommend exhausting the sub-generator with the main generator loop every time the main generator iterates. This is standard practice and exactly what itertools.groupby does, a stdlib generator-of-generators.
eg.
def metagen():
def innergen():
yield 1
yield 2
yield 3
for i in range(3):
r = innergen()
yield r
for _ in r: pass
Or use a dark, secret hack method that I'll show in a mo' (I need to write it), but don't do it!
As promised, the hack (for Python 3, this time 'round):
from collections import UserList
from functools import partial
def objectitemcaller(key):
def inner(*args, **kwargs):
try:
return getattr(object, key)(*args, **kwargs)
except AttributeError:
return NotImplemented
return inner
class Listable(UserList):
def __init__(self, iterator):
self.iterator = iterator
self.iterated = False
def __iter__(self):
return self
def __next__(self):
self.iterated = True
return next(self.iterator)
def _to_list_hack(self):
self.data = list(self)
del self.iterated
del self.iterator
self.__class__ = UserList
for key in UserList.__dict__.keys() - Listable.__dict__.keys():
if key not in ["__class__", "__dict__", "__module__", "__subclasshook__"]:
setattr(Listable, key, objectitemcaller(key))
def metagen():
def innergen():
yield 1
yield 2
yield 3
for i in range(3):
r = Listable(innergen())
yield r
if not r.iterated:
r._to_list_hack()
else:
for item in r: pass
for item in metagen():
print(item)
print(list(item))
#>>> <Listable object at 0x7f46e4a4b850>
#>>> [1, 2, 3]
#>>> <Listable object at 0x7f46e4a4b950>
#>>> [1, 2, 3]
#>>> <Listable object at 0x7f46e4a4b990>
#>>> [1, 2, 3]
list(metagen())
#>>> [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
It's so bad I don't want to even explain it.
The key is that you have a wrapper that can detect whether it has been iterated, and if not you run a _to_list_hack that, I kid you not, changes the __class__ attribute.
Because of conflicting layouts we have to use the UserList class and shadow all of its methods, which is just another layer of crud.
Basically, please don't use this hack. You can enjoy it as humour, though.
A rather pragmatic way would be to tell the "generator of generators" upon creation whether to generate generators or lists. While this is not as convenient as having list magically know what to do, it still seems to be more comfortable than having a special to_list function.
def gengen(n, listmode=False):
for i in range(n):
def gen():
for k in range(i+1):
yield k
yield list(gen()) if listmode else gen()
Depending on the listmode parameter, this can either be used to generate generators or lists.
for gg in gengen(5, False):
print gg, list(gg)
print list(gengen(5, True))

Categories

Resources