TL;DR is what I'm trying to do too complicated for a yield-based generator?
I have a python application where I need to repeat an expensive test on a list of objects, one at a time, and then mangle those that pass. I expect several objects to pass, but I do not want to create a list of all those that pass, as mangle will alter the state of some of the other objects. There is no requirement to test in any particular order. Then rinse and repeat until some stop condition.
My first simple implementation was this, which runs logically correctly
while not stop_condition:
for object in object_list:
if test(object):
mangle(object)
break
else:
handle_no_tests_passed()
unfortunately, for object in object_list: always restarts at the beginning of the list, where the objects probably haven't been changed, and there are objects at the end of the list ready to test. Picking them at random would be slightly better, but I would rather carry on where I left off from the previous for/in call. I still want the for/in call to terminate when it's traversed the entire list.
This sounded like a job for yield, but I tied my brain in knots failing to make it do what I wanted. I can use it in the simple cases, iterating over a range or returning filtered records from some source, but I couldn't find out how to make it save state and restart reading from its source.
I can often do things the long wordy way with classes, but fail to understand how to use the alleged simplifications like yield. Here is a solution that does exactly what I want.
class CyclicSource:
def __init__(self, source):
self.source = source
self.pointer = 0
def __iter__(self):
# reset how many we've done, but not where we are
self.done_this_call = 0
return self
def __next__(self):
ret_val = self.source[self.pointer]
if self.done_this_call >= len(self.source):
raise StopIteration
self.done_this_call += 1
self.pointer += 1
self.pointer %= len(self.source)
return ret_val
source = list(range(5))
q = CyclicSource(source)
print('calling once, aborted early')
count = 0
for i in q:
count += 1
print(i)
if count>=2:
break
else:
print('ran off first for/in')
print('calling again')
for i in q:
print(i)
else:
print('ran off second for/in')
which demonstrates the desired behaviour
calling once, aborted early
0
1
calling again
2
3
4
0
1
ran off second for/in
Finally, the question. Is it possible to do what I want with the simplified generator syntax using yield, or does maintaining state between successive for/in calls require the full class syntax?
Your use of the __iter__ method causes your iterator to be reset. This actually goes quite counter to regular behaviour of an iterator; the __iter__ method should just return self, nothing more. You rely on a side effect of for applying iter() to your iterator each time you create a for i in q: loop. This makes your iterator work, but the behaviour is surprising and will trip up future maintainers. I'd prefer that effect to be split out to a separate .reset() method, for example.
You can reset a generator too, using generator.send() to signal it to reset:
def cyclic_source(source):
pointer = 0
done_this_call = 0
while done_this_call < len(source):
ret_val = source[pointer]
done_this_call += 1
pointer = (pointer + 1) % len(source)
reset = yield ret_val
if reset is not None:
done_this_call = 0
yield # pause again for next iteration sequence
Now you can 'reset' your count back to zero:
q = cyclic_source(source)
for count, i in enumerate(q):
print(i)
if count == 1:
break
else:
print('ran off first for/in')
print('explicitly resetting the generator')
q.send(True)
for i in q:
print(i)
else:
print('ran off second for/in')
This is however, rather.. counter to readability. I'd instead use an infinite generator by using itertools.cycle() that is limited in the number of iterations with itertools.islice():
from itertools import cycle, islice
q = cycle(source)
for count, i in enumerate(islice(q, len(source))):
print(i)
if count == 1:
break
else:
print('ran off first for/in')
for i in islice(q, len(source)):
print(i)
else:
print('ran off second for/in')
q will produce values from source in an endless loop. islice() cuts off iteration after len(source) elements. But because q is reused, it is still maintaining the iteration state.
If you must have a dedicated iterator, stick to a class object and make an iterable, so have it return a new iterator each time __iter__ is called:
from itertools import cycle, islice
class CyclicSource:
def __init__(self, source):
self.length = len(source)
self.source = cycle(source)
def __iter__(self):
return islice(self.source, self.length)
This keeps state in the cycle() iterator still, but simply creates a new islice() object each time you create an iterator for this. It basically encapsulates the islice() approach above.
Related
I have made a function: generatesequence (shown below)
def generatesequence(start: float, itera: float = 1, stop: float = None):
"""
Generate a sequence, that can have a stopping point, starting point.
"""
__num = start
# if sequence has a stopping point
if stop != None:
# if stop is negative
if stop < 0:
# while num is greater than stop (0 < 5, but 0 > -5)
while __num >= stop:
# yield __num variable (yield = return without exiting function)
yield __num
# add iter to __num
__num += itera
else:
while __num <= stop:
yield __num
__num += itera
else:
# if sequence has no stopping point, run forever
while True:
yield __num
__num += itera
I have also made a Sequence Class (also shown below)
class Sequence:
def __init__(self, start, itera, stop):
self.sequence = generatesequence(start, itera, stop)
self.sequencelength = iterlen(self.sequence)
print(self.sequencelength)
def printself(self):
for i in range(self.sequencelength):
print(next(self.sequence))
However, when I run printself on a Sequence instance, it gives me a StopIteration error. How can I fix this?
You don't need to do that with a generator, you can just do the following:
def printself(self):
for i in self.sequence:
print(i)
This way you don't need to calculate the length of the generator beforehand
Caculating length of generator defies the whole purpose of using generator. And it also explains StopIteration.
Unlike list or some data structure that takes O(n) memory space, generator takes O(1) space and it cannot know the length without iterating one by one.
And by calcuating length you have moved the iter for your generator from start to end, and now your iter points at StopIteration.
Now when you access generator afterwards it returns StopIteration.
Actually the whole purpose of generator and the likes is to save memory space for iterables that you know will be iterated at most once. You can not do two or more full iterations on generator. To do that, use list function on generator beforehand and save values in list or similar data structures . Or simply recreate generator after it's been used up (=iterated over).
In short, to fix bug, remove the line where it computes length of generator in init method. And do for loop using
"for i in generator_name: "
syntax
Alternatively you can make a method that makes generator and call it to recreate generator whenever / whereever you need
Recently i have been using the 'yield' in python. And I find generator functions very useful. My query is that, is there something which could decrement the imaginative cursor in the generator object. Just how next(genfun) moves and outputs +i'th item in the container, i would like to know if there exists any function that may call upon something like previous(genfun) and moves to -1th item in the conatiner.
Actual Working
def wordbyword():
words = ["a","b","c","d","e"]
for word in words:
yield word
getword = wordbyword()
next(getword)
next(getword)
Output's
a
b
What I would like to see and achieve is
def wordbyword():
words = ["a","b","c","d","e"]
for word in words:
yield word
getword = wordbyword()
next(getword)
next(getword)
previous(getword)
Expected Output
a
b
a
This may sound silly, but is there someway there is this previous in generator, if not why is it so?. Why not we could decrement the iterator, or am I ignorant of an existing method, pls shower some light. What can be the closest way to implement what I have here in hand.
No there is no such function to sort of go back in a generator function. The reason is that Python does not store up the previous value in a generator function natively, and as it does not store it, it also cannot perform a recalculation.
For example, if your generator is a time-sensitive function, such as
def time_sensitive_generator():
yield datetime.now()
You will have no way to recalculate the previous value in this generator function.
Of course, this is only one of the many possible cases that a previous value cannot be calculated, but that is the idea.
If you do not store the value yourself, it will be lost forever.
As already said, there is no such function since the entire point of a generator is to have a small memory footprint. You would need to store the result.
You could automate the storing of previous results. One use-case of generators is when you have a conceptually infinite list (e.g. that of prime numbers) for which you only need an initial segment. You could write a generator that builds up these initial segments as a side effect. Have an optional history parameter that the generator appends to while it is yielding. For example:
def wordbyword(history = None):
words = ["a","b","c","d","e"]
for word in words:
if isinstance(history,list): history.append(word)
yield word
If you use the generator without an argument, getword = wordbyword(), it will work like an ordinary generator, but if you pass it a list, that list will store the growing history:
hist = []
getword = wordbyword(hist)
print(next(getword)) #a
print(next(getword)) #b
print(hist) #['a','b']
Iterating over a generator object consumes its elements, so there is nothing to go back to after using next. You could convert the generator to a list and implement your own next and previous
index = 0
def next(lst):
global index
index += 1
if index > len(lst):
raise StopIteration
return lst[index - 1]
def previous(lst):
global index
index -= 1
if index == 0:
raise StopIteration
return lst[index - 1]
getword = list(wordbyword())
print(next(getword)) # a
print(next(getword)) # b
print(previous(getword)) # a
One option is to wrap wordbyword with a class that has a custom __next__ method. In this way, you can still use the built-in next function to consume the generator on-demand, but the class will store all the past results from the next calls and make them accessible via a previous attribute:
class save_last:
def __init__(self, f_gen):
self.f_gen = f_gen
self._previous = []
def __next__(self):
self._previous.append(n:=next(self.i_gen))
return n
def __call__(self, *args, **kwargs):
self.i_gen = self.f_gen(*args, **kwargs)
return self
#property
def previous(self):
if len(self._previous) < 2:
raise Exception
return self._previous[-2]
#save_last
def wordbyword():
words = ["a","b","c","d","e"]
for word in words:
yield word
getword = wordbyword()
print(next(getword))
print(next(getword))
print(getword.previous)
Output:
a
b
a
I've created two enumeration methods, one which returns a list and the other which returns a yield/generator:
def enum_list(sequence, start=0):
lst = []
num = start
for sequence_item in sequence:
lst.append((num, sequence_item))
num += 1
return lst
def enum_generator(sequence, start=0):
num = start
for sequence_item in sequence:
yield (num, sequence_item)
num += 1
A few questions on this:
(1) Is changing a list to a generator as simple as doing:
# build via list
l = list()
for item in items:
l.append(item)
# build via iterator
# l = list() (1) <== delete this line
for item in items:
yield item # (2) change l.append(...) to yield ...
(2) Is "lazy evaluation" the only reason to use a generator, or are there other reasons as well?
(1) generator are simply created as adding yield to your iteration.
(2) Yes, for lazy evaluation. But generators are also used to create stack and queue as they can be only iterate once. This property is also exploited in context manager, by yielding the context.
An additional difference in your case is that since list is created before use and generator is evaluated at each next call, the generator function can check the context and come to different result for each yield, depending on external conditions, which vary with time.
Consider pseudocode:
def alloted_time():
while True:
if len(global_queue)>10:
yield 5
else:
yield 10
If queue is large, allot 5 mins for next person, else 10.
I have written the following python function(s):
import numpy
def primes_iterable():
"""Iterable giving the primes"""
# The lowest primes
primes = [2,3,5]
for p in primes:
yield p
for n in potential_primes():
m = int(numpy.sqrt(n))
check = True
for p in primes:
if p > m:
break
if n%p == 0:
check = False
if check:
primes.append(n)
yield n
def potential_primes():
"""Iterable starting at 7 and giving back the non-multiples of 2,3,5"""
yield 7
n = 7
gaps = [4,2,4,2,4,6,2,6]
while 1:
for g in gaps:
n += g
yield n
As you can see, both functions don't have a return statement. Suppose I was to write something like this:
for p in primes_iterable():
if p > 1000:
break
print p
What happens at the level of the memory when the break statement is reached? If I understand correctly, calling primes_iterable() makes the function start, go until the next yield and then pause until it is needed again. When the break statement is reached, does the function instance close up, or does it continue existing in the backgroud, completely useless?
Your function primes_iterable is a generator function. When you call it, nothing happens immediately (other than it returning a generator object). Only when next is called on it does it run to the next yield.
When you call the generator function, you get an iterable generator object. If you're doing that in a for loop, the loop will keep a reference to the generator object while it is running. If you break out of the loop, that reference is released and the generator object can be garbage collected.
But what happens to the code running in the generator function when the generator object is cleaned up? It gets interrupted by a GeneratorStop exception thrown in to it at the yield it was paused for. If you need to, you could have your generator function catch this exception, but you can't do anything useful other than cleaning up your resources and exiting. That is is often done with a try/finally pair, rather than an except statement.
Here's some example code that demonstrates the behavior:
def gen():
print("starting")
try:
while 1:
yield "foo"
except GeneratorExit:
print("caught GeneratorExit")
raise
finally:
print("cleaning up")
Here's a sample run:
>>> for i, s in enumerate(gen()):
print(s)
if i >= 3:
break
starting
foo
foo
foo
foo
caught GeneratorExit
cleaning up
When you break from the for loop there is no reference left to the generator so it will eventually be garbage collected...
Just for clarity calling primes_iterable() creates a generator. Calling next() on the generator passes control to the generator and it runs until it yields. The for implicitly calls next() each loop.
Consider this:
prime = primes_iterable()
print(next(prime)) # 2
for p in prime:
if p > 1000:
break
print(p) # 3, 5, 7, ...
Now you still have a reference to the generator called prime so you can always get the next prime:
print(next(prime)) # 1013
primes_iterable() returns an iterator. This is an object which spits out a new value whenever you call next on it. This is what a for loop does behind the scenes. Try this:
it = primes_iterable()
print(next(it))
print(next(it))
Important to note is that it isn't running forever behind the scenes here, it just runs far enough to spit out a new value whenever you ask it to. It keeps hold of its data so that it's ready to start running again whenever, but you can't access that data.
Now, in your code,
for p in primes_iterable():
As above primes_iterable has been called and has returned an iterator, although in this case the iterator has no name (i.e. it is not bound to a variable). For every step of the loop, p will be assigned to next of the iterator.
if p > 1000:
break
Now we break out and the for loop stops running next on the iterator. Nothing references the iterator any more (you can check this by calling dir() which shows you everything defined in the global namespace).
Therefore after a while Python frees up the memory that the iterator was taking up. This is called garbage collection. It's also what will happen if e.g. you type [1,2,3] into the interpreter but don't bind it to a variable name. It is created but then effectively deleted to free up space because it's pointless.
You can (and should) read more about iterators here:
https://docs.python.org/3/tutorial/classes.html#iterators
I can't figure out how to make my doubly linked list's iterableness work correctly when using nested loops.
My code thus far: http://pastebin.com/PU9iFggr
I have attempted to make it iterable:
def __iter__(self):
self.index = 0
return (self)
def next(self):
try:
result = self._findNode(self.index).get()
except IndexError:
self.index = 0
raise StopIteration
self.index += 1
return result
def __getitem__(self, item):
return self._findNode(item).get()
It seems to work if inside one for loop, but not inside of two:
myList = DoublyLinkedList()
myList.append(0)
myList.append(1)
myList.append(2)
myList.append(3)
for i in myList:
print i #works as expected
for i in myList:
for j in myList:
print j #goes forever
I imagine that the issue is that there is only one self.index inside of the object that is being updated by both of the for loops, but I don't know how to fix this.
Containers should be Iterable, not Iterators. Don't implement next on the class itself. Either make __iter__ a generator function, or write a separate class for it to return that wraps the linked list and implements next.
The easiest approach is to define __iter__ as a generator function:
def __iter__(self):
cur = self.head
while cur is not None:
yield cur.value
cur = cur.nextNode
Remove the next function from DoubleLinkedList and that's it. When you try to iterate it with a for loop, the call to the generator function returns a new, independent generator object which then iterates independently of any other generators that may have been requested. And it's much faster than repeated indexing like you were doing (which has to start from the head and traverse every time; the generator saves state as it goes, so it's only traversing one link in the chain for each item yielded).
I think you know very well where the problem is:
1 for i in mylist:
2 for j in mylist:
3 print j
4 # when j loop ends index goes back 0, this is where the infinite
5 # loop is,next line in execution is 1, and the method called is
6 # "next()", it will read linkedlist[0] for the second time (and
7 # then repeat...forever)
in short every time you call next in i loop, it will just return doubleLinkedList[0], it make to progress towards the index exception.
There are a lot of solutions,
1. if all you do in the nested for loop is print j,you can simply just iterate through the length of your linkedlist:
for i in range(len(mylist)): # I see that you already have the __len__ method
for j in mylist:
print j
2.This is my favorite solution: Instead pf implementing an iterator interface,use python generator:
def traverseList(doubly_linked_list):
# select and delete all of your __iter__() and next(), use the following code
index = 0
while True:
try:
yield doubly_linked_list._findNode(index).get()
index += 1
except IndexError:
break
for i in traverseList(mylist):
for j in traverseList(mylist):
# do things, note that I did not create two linked list
# I sort of create two iterators...
you can look up coroutine if you are not too familiar with generators, but basically they have their own stack, so each iterator of your doubly linked list maintains its own index (what you try to achieve in your code)
3.hmmm I am still thinking, I will update if I got any new ideas