"Functions that consume an entire iterable won't terminate"? - python

In David Beazley's talk on generators, he states, as a caveat:
Functions that consume an entire iterable won't terminate(min, max,
sum, set etc.)
What is meant here?
gen = (x*2 for x in [1,2,3,4,5])
sum(gen) terminates just fine.

He's refering to that within the concept of infinite sequences, if you provide an infinite sequence to max et al, they simply won't be able to return a value.
Want to replicate? Aside from building a custom infinite sequence, Python has a built-in set of these in itertools (namely repeat, count, cycle). Try and do:
from itertools import repeat
max(repeat(20))
and see what happens. Actually, don't do that, max will keep on munching away as repeat keeps on giving numbers1. It's a love afair that lasts the challenge of time and never terminates :-)
1 -- Imagine Pac-Man in a never ending straight-line; constantly eating those little yellow thingies. Pac-Man = max, yellow thingies generated by repeat.

If you pay attention, you'll note that he writes that in Part 5 of the presentation, "Processing Infinite Data". Since infinite generators yield an infinite number of items, functions that attempt to consume the entire generator will never return.

An endless generator won't terminate when consumed:
def gen():
while True:
yield 1
sum(gen())
Note: Don't actually execute the last line.

In the provided document the comment is directed towards the follow function on page 39 which is designed to lock up the program until the file is added to, any infinite generator will not terminate when used with functions that use an iterable.

He is talking about infinite itertors many of which can be found in itertools for Python. If you use an infinite iterator with them they won't return.

Related

How to set page range stop value when it can be anything in Python

I've set up my first script to scrape webpages, but currently by defining a start and stop point in my range like so:
for page in range(1, 4):
However, I have no idea how many pages there will be at any given moment (value will regularly change) - what should I use as my stop value in this scenario? I want to avoid the hack of putting a ridiculously high value.
There are many ways to skin this cat, but I would use the count method in itertools:
import itertools
for page in itertools.count(1):
...
My first thought was that it would be easy to write your own generator to do this. That thought was immediately followed by others that led me to the solution I gave above. But what if there was some sort of sequence you wanted to generate that wasn't handled by an existing generator? Generators really are trivial to write. To illustrate, here's how the count generator is implemented:
def count(start=0, step=1):
n = start
while True:
yield n
n += step
About the simplest thing you could write, yes? It's good to know about yield and how generators work in general.

Can you get all positive numbers in python?

So I am playing around in python and I thought:
"Is there a built in list or function that already has infinite numbers, if not how could I build one?"
I did some research and found...
Nothing
That is why I am here!
If you don't know what I mean it would be this but infinite:
b = [1,2,3,4,5,6,7,8,9]
etc;
If there is please tell me! Thank you!
Edit: Thank you!
You can define your own infinite list using a generator function without any module imports.
def infinite_list():
n = 1
while True:
yield n
n += 1
my_list = infinite_list()
Each time you want the next number call the next method:
next(my_list)
You can also iterate over it but it will, of course, be an infinite loop:
for elem in my_list:
//infinite loop
itertools.count produces an iterator over arbitrary, infinite numbers. It's not a sequence (you can't index it, you can only iterate each instance once), but you can keep going forever.
Actually running it to completion will take infinite time of course, so it's typically paired with something that limits it, e.g. itertools.islice, zip-ing with a finite iterable, etc.
from itertools import count
for i in count(1):
print(i)
its infinite though so it might take a while to run

Infinite for loop in Python like C++

In C++ we can write an infinite for loop like for(;;). Is here any syntax like this to write an infinite for loop in Python?
Note : I know that if I write for i in range(a_very_big_value) then it may run infinity. I am searching a simple syntax like C++ or any other tricks to write infinite for loop in Python.
Python's for statement is a "for-each" loop (sort of like range-for in C++11 and later), not a C-style "for-computation" loop.
But notice that in C, for (;;) does the same thing as while (1). And Python's while loop is basically the same as C's with a few extra bells and whistles. And, in fact, the idiomatic way to loop forever is:1
while True:
If you really do want to write a for loop that goes forever, you need an iterable of infinite length. You can grab one out of the standard library:2
for _ in itertools.count():
… or write one yourself:
def forever():
while True:
yield None
for _ in forever():
But again, this isn't really that similar to for (;;), because it's a for-each loop.
1. while 1: used to be a common alternative. It's faster in older versions of Python, although not in current ones, and occasionally that mattered.
2. Of course the point of count isn't just going on forever, it's counting up numbers forever. For example, if enumerate didn't exist, you could write it as zip(itertools.count(), it).
Yes, it is possible.
With a while loop:
while True:
...
With a for loop (just for kicks):
from itertools import cycle
for _ in cycle(range(1)):
...
The cycle returns 1 indefinitely.
In either case, it's up to you to implement your loop logic in such a way that you terminate eventually. And lastly, if you want to implement an execute-until-___ loop, you should stick to while True, because that's the idiomatic way of doing it.
I found the answer from here and here
Using itertools.count:
import itertools
for i in itertools.count():
if there_is_a_reason_to_break(i):
break
In Python2 xrange() is limited to sys.maxint, which may be enough for most practical purposes:
import sys
for i in xrange(sys.maxint):
if there_is_a_reason_to_break(i):
break
In Python3, range() can go much higher, though not to infinity:
import sys
for i in range(sys.maxsize**10): # you could go even higher if you really want
if there_is_a_reason_to_break(i):
break
So it's probably best to use count()
It is also possible to achieve this by mutating the list you're iterating on, for example:
l = [1]
for x in l:
l.append(x + 1)
print(x)
Yes, here you are:
for i in __import__("itertools").count():
pass
The infinite iterator part was taken from this answer. If you really think about it though, a while loop looks way better.
while True:
pass
You can use:
while True:
# Do stuff here
for _ in iter(str, "forever"):
... pass
This may help. Because the iter() function creates an iterator that will call str() with no arguments for each call to its next() method[ This returns an empty string ]; if the value returned by str() is equal to the string "forever"(WHICH WILL NEVER HAPPEN AS "" != "forever"), StopIteration will be raised thus breaking the loop, otherwise the loop will continue running and this is the case for our loop.
REF: https://docs.python.org/3/library/functions.html?highlight=iter#iter

for or while loop to do something n times [duplicate]

This question already has answers here:
More Pythonic Way to Run a Process X Times [closed]
(5 answers)
Closed 7 months ago.
In Python you have two fine ways to repeat some action more than once. One of them is while loop and the other - for loop. So let's have a look on two simple pieces of code:
for i in range(n):
do_sth()
And the other:
i = 0
while i < n:
do_sth()
i += 1
My question is which of them is better. Of course, the first one, which is very common in documentation examples and various pieces of code you could find around the Internet, is much more elegant and shorter, but on the other hand it creates a completely useless list of integers just to loop over them. Isn't it a waste of memory, especially as far as big numbers of iterations are concerned?
So what do you think, which way is better?
but on the other hand it creates a completely useless list of integers just to loop over them. Isn't it a waste of memory, especially as far as big numbers of iterations are concerned?
That is what xrange(n) is for. It avoids creating a list of numbers, and instead just provides an iterator object.
In Python 3, xrange() was renamed to range() - if you want a list, you have to specifically request it via list(range(n)).
This is lighter weight than xrange (and the while loop) since it doesn't even need to create the int objects. It also works equally well in Python2 and Python3
from itertools import repeat
for i in repeat(None, 10):
do_sth()
python3 & python2
just use range():
for _ in range(n):
# do something n times exactly
The fundamental difference in most programming languages is that unless the unexpected happens a for loop will always repeat n times or until a break statement, (which may be conditional), is met then finish with a while loop it may repeat 0 times, 1, more or even forever, depending on a given condition which must be true at the start of each loop for it to execute and always false on exiting the loop, (for completeness a do ... while loop, (or repeat until), for languages that have it, always executes at least once and does not guarantee the condition on the first execution).
It is worth noting that in Python a for or while statement can have break, continue and else statements where:
break - terminates the loop
continue - moves on to the next time around the loop without executing following code this time around
else - is executed if the loop completed without any break statements being executed.
N.B. In the now unsupported Python 2 range produced a list of integers but you could use xrange to use an iterator. In Python 3 range returns an iterator.
So the answer to your question is 'it all depends on what you are trying to do'!

Pause Python Generator

I have a python generator that does work that produces a large amount of data, which uses up a lot of ram. Is there a way of detecting if the processed data has been "consumed" by the code which is using the generator, and if so, pause until it is consumed?
def multi_grab(urls,proxy=None,ref=None,xpath=False,compress=True,delay=10,pool_size=50,retries=1,http_obj=None):
if proxy is not None:
proxy = web.ProxyManager(proxy,delay=delay)
pool_size = len(pool_size.records)
work_pool = pool.Pool(pool_size)
partial_grab = partial(grab,proxy=proxy,post=None,ref=ref,xpath=xpath,compress=compress,include_url=True,retries=retries,http_obj=http_obj)
for result in work_pool.imap_unordered(partial_grab,urls):
if result:
yield result
run from:
if __name__ == '__main__':
links = set(link for link in grab('http://www.reddit.com',xpath=True).xpath('//a/#href') if link.startswith('http') and 'reddit' not in link)
print '%s links' % len(links)
counter = 1
for url, data in multi_grab(links,pool_size=10):
print 'got', url, counter, len(data)
counter += 1
A generator simply yields values. There's no way for the generator to know what's being done with them.
But the generator also pauses constantly, as the caller does whatever it does. It doesn't execute again until the caller invokes it to get the next value. It doesn't run on a separate thread or anything. It sounds like you have a misconception about how generators work. Can you show some code?
The point of a generator in Python is to get rid of extra, unneeded objects after each iteration. The only time it will keep those extra objects (and thus extra ram) is when the objects are being referenced somewhere else (such as adding them to a list). Make sure you aren't saving these variables unnecessarily.
If you're dealing with multithreading/processing, then you probably want to implement a Queue that you could pull data from, keeping track of the number of tasks you're processing.
I think you may be looking for the yield function. Explained in another StackOverflow question: What does the "yield" keyword do in Python?
A solution could be to use a Queue to which the generator would add data, while another part of the code would get data from it and process it. This way you could ensure that there is no more than n items in memory at the same time.

Categories

Resources