I have a generator function like the following:
def myfunct():
...
yield result
The usual way to call this function would be:
for r in myfunct():
dostuff(r)
My question, is there a way to get just one element from the generator whenever I like?
For example, I'd like to do something like:
while True:
...
if something:
my_element = pick_just_one_element(myfunct())
dostuff(my_element)
...
Create a generator using
g = myfunct()
Everytime you would like an item, use
next(g)
(or g.next() in Python 2.5 or below).
If the generator exits, it will raise StopIteration. You can either catch this exception if necessary, or use the default argument to next():
next(g, default_value)
For picking just one element of a generator use break in a for statement, or list(itertools.islice(gen, 1))
According to your example (literally) you can do something like:
while True:
...
if something:
for my_element in myfunct():
dostuff(my_element)
break
else:
do_generator_empty()
If you want "get just one element from the [once generated] generator whenever I like" (I suppose 50% thats the original intention, and the most common intention) then:
gen = myfunct()
while True:
...
if something:
for my_element in gen:
dostuff(my_element)
break
else:
do_generator_empty()
This way explicit use of generator.next() can be avoided, and end-of-input handling doesn't require (cryptic) StopIteration exception handling or extra default value comparisons.
The else: of for statement section is only needed if you want do something special in case of end-of-generator.
Note on next() / .next():
In Python3 the .next() method was renamed to .__next__() for good reason: its considered low-level (PEP 3114). Before Python 2.6 the builtin function next() did not exist. And it was even discussed to move next() to the operator module (which would have been wise), because of its rare need and questionable inflation of builtin names.
Using next() without default is still very low-level practice - throwing the cryptic StopIteration like a bolt out of the blue in normal application code openly. And using next() with default sentinel - which best should be the only option for a next() directly in builtins - is limited and often gives reason to odd non-pythonic logic/readablity.
Bottom line: Using next() should be very rare - like using functions of operator module. Using for x in iterator , islice, list(iterator) and other functions accepting an iterator seamlessly is the natural way of using iterators on application level - and quite always possible. next() is low-level, an extra concept, unobvious - as the question of this thread shows. While e.g. using break in for is conventional.
Generator is a function that produces an iterator. Therefore, once you have iterator instance, use next() to fetch the next item from the iterator.
As an example, use next() function to fetch the first item, and later use for in to process remaining items:
# create new instance of iterator by calling a generator function
items = generator_function()
# fetch and print first item
first = next(items)
print('first item:', first)
# process remaining items:
for item in items:
print('next item:', item)
You can pick specific items using destructuring, e.g.:
>>> first, *middle, last = range(10)
>>> first
0
>>> middle
[1, 2, 3, 4, 5, 6, 7, 8]
>>> last
9
Note that this is going to consume your generator, so while highly readable, it is less efficient than something like next(), and ruinous on infinite generators:
>>> first, *rest = itertools.count()
🔥🔥🔥
I don't believe there's a convenient way to retrieve an arbitrary value from a generator. The generator will provide a next() method to traverse itself, but the full sequence is not produced immediately to save memory. That's the functional difference between a generator and a list.
generator = myfunct()
while True:
my_element = generator.next()
make sure to catch the exception thrown after the last element is taken
For those of you scanning through these answers for a complete working example for Python3... well here ya go:
def numgen():
x = 1000
while True:
x += 1
yield x
nums = numgen() # because it must be the _same_ generator
for n in range(3):
numnext = next(nums)
print(numnext)
This outputs:
1001
1002
1003
I believe the only way is to get a list from the iterator then get the element you want from that list.
l = list(myfunct())
l[4]
Related
I recently studied a python recursion function and found that the recursion stops when it uses element in []. So I made a simple test function, found that there is even no print out. So how can I understand the element in []? Why does the function stop when referring to element in []?
b=1
def simple():
for i in []:
print('i am here')
return i+b
a = simple()
Python's in keyword has two purposes.
One use in as part of a for loop, which is written for element in iterable. This assigns each value from iterable to element on each pass through the loop body. This is how your example function is using in (though since the list you're looping over is empty, the loop never does anything).
The other way you can use in is as an operator. An expression like x in y tests if element x is present in container y. (There's also a negated version of the in operator, not in. The expression x not in y is exactly equivalent to not (x in y).) I suspect this is what your recursive code is doing. This would also not be useful to do with an empty list literal (since an empty list by definition doesn't contain anything), but I'm guessing the real recursive function is a bit more complicated.
As an example of both uses of in, here's a generator function that uses a set to filter out duplicate items from some other iterable. It has a for loop that has in, and it also uses in (well, technically not in) as an operator to test if the next value from the input iterator is contained in the seen set:
def unique(iterable):
seen = set()
for item in iterable: # "in" used by for loop
if item not in seen: # "in" used here as an operator
yield item
seen.add(item)
A recursive function calls itself n-number of times, then returns a terminating value on the last recursion that backs out of the recursive stacks.
Example:
compute the factorial of a number:
def fact(n):
# ex: 5 * 4 * 3 * 2 * 1
# n == 0 is your terminating recursion
if n == 0:
return 1
# else is your recursion call to fact(n-1)
else:
return n * fact(n-1)
In your example, there is no recursive call to simple() within the function, nor are there any element inside the empty list [] to step through, therefore your for loop never executed
Its concerned about mechanism of 'for loop'.
Superficially, the iterator you want to travese (which is "[]" in you example) has a length of 0, so the body of the loop (which include "print" an so on) will not be executed.
Hope it helps.
I have an iterator it that I'm assuming already sorted but I would like to raise an exception if it isn't.
Data from iterator is not in memory so I do not want to use sorted() builtin because AFAIK it puts the whole iterator in a list.
The solution I'm using now is to wrap the iterator in a generator function like this:
def checkSorted(it):
prev_v = it.next()
yield prev_v
for v in it:
if v >= prev_v:
yield v
prev_v = v
else:
raise ValueError("Iterator is not sorted")
So that I can use it like this:
myconsumer(checkSorted(it))
Does someone know if there are better solutions?
I know that my solution works but it seems quite strange (at least to me) writing a module on my own to accomplish such a trivial task. I'm looking for a simple one liner or builtin solution (If it exists)
Basically your solution is almost as elegant as it gets (you could of course put it in an utility module if you find it generally useful). You could if you wanted it use an infinity object to cut the code down a bit, but then you have to include a class definition as well which grows the code again (unless you inline the class definition):
def checkSorted(it):
prev = type("", (), {"__lt__": lambda a, b: False})()
for x in it:
if prev < x:
raise ValueError("Not sorted")
prev = x
yield x
The first line is using the type to first create a class and then instantiate it. Objects of this class compares less than to anything (infinity object).
The problem with doing a one-liner is that you have to deal with three constructs: you have to update state (assignment), throw an exception and doing a loop. You could easily perform these by using statements, but making them into a oneliner will mean that you will have to try to put the statements on the same line - which in turn will result in problem with the loop and if-constructs.
If you want to put the whole thing into an expression you will have to use dirty tricks to do these, the assignment and looping the iterutils can provide and the throwing can be done by using the throw method in a generator (which can be provided in an expression too):
imap( lambda i, j: (i >= j and j or (_ for _ in ()).throw(ValueError("Not sorted"))), *(lambda pre, it: (chain([type("", (), {"__ge__": lambda a, b: True})()], pre), it))(*tee(it)))
The last it is the iterator you want to check and the expression evaluates to a checked iterator. I agree it's not good looking and not obvious what it does, but you asked for it (and I don't think you wanted it).
As an alternative i suggest to use itertools.izip_longest (and zip_longest in python 3 )to create a generator contains consecutive pairs :
You can use tee to create 2 independent iterators from a first iterable.
from itertools import izip_longest,tee
def checkSorted(it):
pre,it=tee(it)
next(it)
for i,j in izip_longest(pre,it):
if j:
if i >= j:
yield i
else:
raise ValueError("Iterator is not sorted")
else :
yield i
Demo :
it=iter([5,4,3,2,1])
print list(checkSorted(it))
[5, 4, 3, 2, 1]
it=iter([5,4,3,2,3])
print list(checkSorted(it))
Traceback (most recent call last):
File "/home/bluebird/Desktop/ex2.py", line 19, in <module>
print list(checkSorted(it))
File "/home/bluebird/Desktop/ex2.py", line 10, in checkSorted
raise ValueError("Iterator is not sorted")
ValueError: Iterator is not sorted
Note : Actually I think there is no need to yield the values of your iterable wen you have them already.So as a more elegant way I suggest to use a generator expression within all function and return a bool value :
from itertools import izip,tee
def checkSorted(it):
pre,it=tee(it)
next(it)
return all(i>=j for i,j in izip(pre,it))
I am trying to create a generator in python 3.4 using the built in next() function. Here is my current code:
myGenerator = next(t for t in [1,2,3,4])
myGenerator
1
myGenerator
1.....
Whenever I call the generator I built, it only returns 1 each time I call it, which is strange, as I thought that each element in the generator can only be iterated through once. How do I fix this code, so that it will print out 1,2,3,4, in that order? Thanks for the help.
It's a bit nice, because you actually have the generator which you want, which is
(t for t in [1,2,3,4])
but then you run next on it. If you look at its documentation you can see that it's doing exactly what the docs say: you applied next to the generator, it returned an object, and you bound this object to something you called myGenerator (it's probably yours, but it's not a generator). Each time you call it, it evaluates to a simple object.
>>> myGenerator = (t for t in [1,2,3,4])
>>> print next(myGenerator)
1
>>> print next(myGenerator)
2
If you don't know how to use next, I suggest you use yield:
def g(l):
for i,x in enumerate(l):
yield x
if i == len(l):
break
for i in g([1,2,3,4]):
print i
Demo:
1
2
3
4
yield can do same feature.
I have a text file like this:
11
2
3
4
11
111
Using Python 2.7, I want to turn it into a list of lists of lines, where line breaks divide items in the inner list and empty lines divide items in the outer list. Like so:
[["11","2","3","4"],["11"],["111"]]
And for this purpose, I wrote a generator function that would yield the inner lists one at a time once passed an open file object:
def readParag(fileObj):
currentParag = []
for line in fileObj:
stripped = line.rstrip()
if len(stripped) > 0: currentParag.append(stripped)
elif len(currentParag) > 0:
yield currentParag
currentParag = []
That works fine, and I can call it from within a list comprehension, producing the desired result. However, it subsequently occurred to me that I might be able to do the same thing more concisely using itertools.takewhile (with a view to rewriting the generator function as a generator expression, but we'll leave that for now). This is what I tried:
from itertools import takewhile
def readParag(fileObj):
yield [ln.rstrip() for ln in takewhile(lambda line: line != "\n", fileObj)]
In this case, the resulting generator yields only one result (the expected first one, i.e. ["11","2","3","4"]). I had hoped that calling its next method again would cause it to evaluate takewhile(lambda line: line != "\n", fileObj) again on the remainder of the file, thus leading it to yield another list. But no: I got a StopIteration instead. So I surmised that the take while expression was being evaluated once only, at the time when the generator object was created, and not each time I called the resultant generator object's next method.
This supposition made me wonder what would happen if I called the generator function again. The result was that it created a new generator object that also yielded a single result (the expected second one, i.e. ["11"]) before throwing a StopIteration back at me. So in fact, writing this as a generator function effectively gives the same result as if I'd written it as an ordinary function and returned the list instead of yielding it.
I guess I could solve this problem by creating my own class to use instead of a generator (as in John Millikin's answer to this question). But the point is that I was hoping to write something more concise than my original generator function (possibly even a generator expression). Can somebody tell me what I'm doing wrong, and how to get it right?
What you're trying to do is a perfect job for groupby:
from itertools import groupby
def read_parag(filename):
with open(filename) as f:
for k,g in groupby((line.strip() for line in f), bool):
if k:
yield list(g)
which will give:
>>> list(read_parag('myfile.txt')
[['11', '2', '3', '4'], ['11'], ['111']]
Or in one line:
[list(g) for k,g in groupby((line.strip() for line in open('myfile.txt')), bool) if k]
The other answers do a good job of explaining what is going on here, you need to call takewhile multiple times which your current generator does not do. Here is a fairly concise way to get the behavior you want using the built-in iter() function with a sentinel argument:
from itertools import takewhile
def readParag(fileObj):
cond = lambda line: line != "\n"
return iter(lambda: [ln.rstrip() for ln in takewhile(cond, fileObj)], [])
This is exactly how .takewhile() should behave. While the condition is true, it'll return elements from the underlying iterable, and as soon as it's false, it permamently switches to the iteration-done stage.
Note that this is how iterators must behave; raising StopIteration means just that, stop iterating over me, I am done.
From the python glossary on "iterator":
An object representing a stream of data. Repeated calls to the iterator’s next() method return successive items in the stream. When no more data are available a StopIteration exception is raised instead. At this point, the iterator object is exhausted and any further calls to its next() method just raise StopIteration again.
You could combine takewhile with tee to see if there are any more results in the next batch:
import itertools
def readParag(filename):
with open(filename) as f:
while True:
paras = itertools.takewhile(lambda l: l.strip(), f)
test, paras = itertools.tee(paras)
test.next() # raises StopIteration when the file is done
yield (l.strip() for l in paras)
This yields generators, so each item yielded is itself a generator. You do need to consume all elements in these generators for this to continue to work; the same is true for the groupby method listed in another answer.
If the file contents fit into memory, there is a much easier way to get the groups separated by blank lines:
with open("filename") as f:
groups = [group.split() for group in f.read().split("\n\n")]
This approach can be made more robust by using re.split() instead of str.split() and by filtering out potential empty groups resulting from four or more consecutive line breaks.
This is the documented behavior of takewhile. It takes while the condition is true. It doesn't start up again if the condition later becomes true again.
The simple fix is to make your function just call takewhile in a loop, stopping when takewhile has nothing more to return (i.e., at the end of the file):
def readParag(fileObj):
while True:
nextList = [ln.rstrip() for ln in takewhile(lambda line: line != "\n", fileObj)]
if not nextList:
break
yield nextList
You can call takewhile multiple times:
>>> def readParagGenerator(fileObj):
... group = [ln.rstrip() for ln in takewhile(lambda line: line != "\n", fileObj)]
... while len(group) > 0:
... yield group
... group = [ln.rstrip() for ln in takewhile(lambda line: line != "\n", fileObj)]
...
>>> list(readParagGenerator(StringIO(F)))
[['11', '2', '3', '4'], ['11'], ['111']]
How can one loop through a generator? I thought about this way:
gen = function_that_returns_a_generator(param1, param2)
if gen: # in case the generator is null
while True:
try:
print gen.next()
except StopIteration:
break
Is there a more pythonic way?
Simply
for x in gen:
# whatever
will do the trick. Note that if gen always returns True.
for item in function_that_returns_a_generator(param1, param2):
print item
You don't need to worry about the test to see if there is anything being returned by your function as if there's nothing returned you won't enter the loop.
In case you don't need the output of the generator because you care only about its side effects, you can use the following one-liner:
for _ in gen: pass
Follow up
Following the comment by aiven I made some performance tests, and while it seems that list(gen) is slightly faster than for _ in gen: pass, it comes out that tuple(gen) is even faster. However, as Erik Aronesty correctly points out, tuple(gen) and list(gen) store the results, so my final advice is to use
tuple(gen)
but only if the generator is not going to loop billions of times soaking up too much memory.
You can simply loop through it:
>>> gen = (i for i in range(1, 4))
>>> for i in gen: print i
1
2
3
But be aware, that you can only loop one time. Next time generator will be empty:
>>> for i in gen: print i
>>>
The other answers are good for complicated scenarios. If you simply want to stream the items into a list:
x = list(generator)
For simple preprocessing, use list comprehensions:
x = [tup[0] for tup in generator]
If you just want to execute the generator without saving the results, you can skip variable assignment:
# no var assignment b/c we don't need what print() returns
[print(_) for _ in gen]
Don't do this if your generator is infinite (say, streaming items from the internet). The list construction is a blocking op that won't stop until the generator is empty.
Just treat it like any other iterable:
for val in function_that_returns_a_generator(p1, p2):
print val
Note that if gen: will always be True, so it's a false test
If you want to manually move through the generator (i.e., to work with each loop manually) then you could do something like this:
from pdb import set_trace
for x in gen:
set_trace()
#do whatever you want with x at the command prompt
#use pdb commands to step through each loop of the generator e.g., >>c #continue