How to len(generator()) [duplicate] - python

This question already has answers here:
Length of generator output [duplicate]
(9 answers)
What's the shortest way to count the number of items in a generator/iterator?
(7 answers)
Closed 9 years ago.
Python generators are very useful. They have advantages over functions that return lists. However, you could len(list_returning_function()). Is there a way to len(generator_function())?
UPDATE:
Of course len(list(generator_function())) would work.....
I'm trying to use a generator I've created inside a new generator I'm creating. As part of the calculation in the new generator it needs to know the length of the old one. However I would like to keep both of them together with the same properties as a generator, specifically - not maintain the entire list in memory as it may be very long.
UPDATE 2:
Assume the generator knows it's target length even from the first step. Also, there's no reason to maintain the len() syntax. Example - if functions in Python are objects, couldn't I assign the length to a variable of this object that would be accessible to the new generator?

The conversion to list that's been suggested in the other answers is the best way if you still want to process the generator elements afterwards, but has one flaw: It uses O(n) memory. You can count the elements in a generator without using that much memory with:
sum(1 for x in generator)
Of course, be aware that this might be slower than len(list(generator)) in common Python implementations, and if the generators are long enough for the memory complexity to matter, the operation would take quite some time. Still, I personally prefer this solution as it describes what I want to get, and it doesn't give me anything extra that's not required (such as a list of all the elements).
Also listen to delnan's advice: If you're discarding the output of the generator it is very likely that there is a way to calculate the number of elements without running it, or by counting them in another manner.

Generators have no length, they aren't collections after all.
Generators are functions with a internal state (and fancy syntax). You can repeatedly call them to get a sequence of values, so you can use them in loop. But they don't contain any elements, so asking for the length of a generator is like asking for the length of a function.
if functions in Python are objects, couldn't I assign the length to a
variable of this object that would be accessible to the new generator?
Functions are objects, but you cannot assign new attributes to them. The reason is probably to keep such a basic object as efficient as possible.
You can however simply return (generator, length) pairs from your functions or wrap the generator in a simple object like this:
class GeneratorLen(object):
def __init__(self, gen, length):
self.gen = gen
self.length = length
def __len__(self):
return self.length
def __iter__(self):
return self.gen
g = some_generator()
h = GeneratorLen(g, 1)
print len(h), list(h)

Suppose we have a generator:
def gen():
for i in range(10):
yield i
We can wrap the generator, along with the known length, in an object:
import itertools
class LenGen(object):
def __init__(self,gen,length):
self.gen=gen
self.length=length
def __call__(self):
return itertools.islice(self.gen(),self.length)
def __len__(self):
return self.length
lgen=LenGen(gen,10)
Instances of LenGen are generators themselves, since calling them returns an iterator.
Now we can use the lgen generator in place of gen, and access len(lgen) as well:
def new_gen():
for i in lgen():
yield float(i)/len(lgen)
for i in new_gen():
print(i)

You can use len(list(generator_function()). However, this consumes the generator, but that's the only way you can find out how many elements are generated. So you may want to save the list somewhere if you also want to use the items.
a = list(generator_function())
print(len(a))
print(a[0])

You can len(list(generator)) but you could probably make something more efficient if you really intend to discard the results.

You can use reduce.
For Python 3:
>>> import functools
>>> def gen():
... yield 1
... yield 2
... yield 3
...
>>> functools.reduce(lambda x,y: x + 1, gen(), 0)
In Python 2, reduce is in the global namespace so the import is unnecessary.

You can use send as a hack:
def counter():
length = 10
i = 0
while i < length:
val = (yield i)
if val == 'length':
yield length
i += 1
it = counter()
print(it.next())
#0
print(it.next())
#1
print(it.send('length'))
#10
print(it.next())
#2
print(it.next())
#3

You can combine the benefits of generators with the certainty of len(), by creating your own iterable object:
class MyIterable(object):
def __init__(self, n):
self.n = n
def __len__(self):
return self.n
def __iter__(self):
self._gen = self._generator()
return self
def _generator(self):
# Put your generator code here
i = 0
while i < self.n:
yield i
i += 1
def next(self):
return next(self._gen)
mi = MyIterable(100)
print len(mi)
for i in mi:
print i,
This is basically a simple implementation of xrange, which returns an object you can take the len of, but doesn't create an explicit list.

Related

Python - How to access the current state of the range-type object returned by range()?

In Python 3, if I'm not mistaken, the range-type object created by range() generates numbers on-demand for a for-loop, in a way that is somewhat similar to a generator.
I'm wondering if it is possible to access the current value that the range-object is giving to the for-loop (I'm running a thread elsewhere to keep track of progress in a for loop). For example, in:
range_obj = range(0, 10000)
for i in range_obj:
print(i)
Let's say I'm on the 100th iteration in which "99" will be printed, and I only have access to the variable range_obj. Is there some method or variable I can call on with range_obj (such as in the manner of range_obj.X) to get the number "99"? If not a method or variable, Is there any possible way for me to access this number "99" from outside the loop?
I'd try something like this:
class MyRange:
def __init__(self, *params):
self.range_iterator = iter(range(*params))
self.sentinel = None
def __iter__(self):
return self
def __next__(self):
self.sentinel = next(self.range_iterator)
return self.sentinel
r = MyRange(0, 1000)
for i in r:
print(r.sentinel)
Range objects carry no iteration state. There's a common misconception that range returns a generator, but range objects are lazy, immutable sequences, not generators. You can't determine the state of an iteration over a range by looking at the range.

Applying different functions to generator depending on its length

Problem
I have an iterator spam to which I want to apply the function foo if it generates few items, and bar otherwise. With other words, I wish to translate the following code for iterables to generators:
if len(spam) <= max_size_for_foo:
foo(spam)
else:
bar(spam)
max_size_for_foo is comparably small so there is no problem if a list or other iterable of this length is created, but if spam is long, it must not be converted to a list (otherwise, memory problems ensue).
Dissatisfying solution
The best solution I could come up with so far is the following:
first_items = []
try:
for i in range(max_size_for_foo+1):
first_items.append(next(my_generator))
except StopIteration:
foo(first_items)
else:
my_generator = chain(first_items, my_generator)
bar(my_generator)
However, extracting a temporary list and chaining back into the generator feels rather dirty and inelegant to me.
Question
Is there a more elegant or Pythonesque way to do this?
The easiest way is probably to define your generator in a function so that it can be reused:
def spam_func():
return (i for i in [1, 2, 3])
spam_length = sum(1 for _ in spam_func())
if spam_length <= max_size_for_foo:
foo(spam_func())
else:
bar(spam_func())
There is no such thing as generator length, because generator easily can be infinite:
def square():
a = 0
while True:
yield a**2
a += 1
One solution is sum(1 for _ in gen).
Another one may be wrap your generator with a class something like this:
class wrapper:
def __init__(self, items):
self.items = items
def __len__(self):
return len(self.items)
def generate(self):
yeild from self.items

restartable generator available in Python Standard Library?

I can't imagine that I'm the first to write a class like this:
class RestartableGenerator:
def __init__(self, g):
self.g = g
def __iter__(self):
return self.g().__iter__()
if __name__=='__main__':
def gen():
print 'Generating'
for i in range(5):
yield i
i = RestartableGenerator(gen)
print 'Using'
print list(i)
print list(i)
The test produces this output:
Using
Generating
[0, 1, 2, 3, 4]
Generating
[0, 1, 2, 3, 4]
But I didn't find it in the Standard Library. I looked in itertools and functools.
Is it really not there? If it is, where?
Was it considered unnecessary, because when you want to evaluate a sequence multiple times, you better store it in a list?
Edit 1:
My use case is that I want it to be tranparent for the consumer that the sequence is, for memory consumption reasons, a generator instead of a list.
Edit 2: If there's no such class in the Standard Library, what name do you think is appropriate? ParenthesisRemover? MultipleTimesIterable? Anything else? Why?
You could simplify it a little:
class RestartableGenerator:
def __init__(self, g):
self.g = g
def __iter__(self):
return self.g()
Calling gen() returns a generator object. The generator object has a next method, which is the kind of object which __iter__ must return.
There is no need for RestartableGenerator, however, since it does nothing that gen itself can not do. Instead of holding gen in a class attribute, just hold on to gen itself.
def gen():
print 'Generating'
for i in range(5):
yield i
print 'Using'
print list(gen())
print list(gen())
This isn't exactly restartable in the sense of going back to the beginning of the sequence. Each __iter__ call creates a new generator that will rerun the generator code, potentially reexecuting side effects and producing different results. If you want independent iterators over a generated sequence, that's what list or itertools.tee are for. Otherwise, it's clearer to explicitly call the generator function again, so this isn't very useful. You save a pair of parentheses at the cost of less explicit, more bug-prone code.
Note that if you want a lazy sequence type, where iterating over it generates elements on the fly, but you can iterate over it repeatedly, you should define its __iter__ method as a generator:
class Primes(object):
def __iter__(self):
for i in itertools.count():
if is_prime(i):
yield i
This isn't a "restartable generator", but it sounds like what you want.

How to remove duplicates in set for objects?

I have set of objects:
class Test(object):
def __init__(self):
self.i = random.randint(1,10)
res = set()
for i in range(0,1000):
res.add(Test())
print len(res) = 1000
How to remove duplicates from set of objects ?
Thanks for answers, it's work:
class Test(object):
def __init__(self, i):
self.i = i
# self.i = random.randint(1,10)
# self.j = random.randint(1,20)
def __keys(self):
t = ()
for key in self.__dict__:
t = t + (self.__dict__[key],)
return t
def __eq__(self, other):
return isinstance(other, Test) and self.__keys() == other.__keys()
def __hash__(self):
return hash(self.__keys())
res = set()
res.add(Test(2))
...
res.add(Test(8))
result: [2,8,3,4,5,6,7]
but how to save order ? Sets not support order. Can i use list instead set for example ?
Your objects must be hashable (i.e. must have __eq__() and __hash__() defined) for sets to work properly with them:
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
def __eq__(self, other):
return self.i == other.i
def __hash__(self):
return self.i
An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.
Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.
If you have several attributes, hash and compare a tuple of them (thanks, delnan):
class Test(object):
def __init__(self):
self.i = random.randint(1, 10)
self.k = random.randint(1, 10)
self.j = random.randint(1, 10)
def __eq__(self, other):
return (self.i, self.k, self.j) == (other.i, other.k, other.j)
def __hash__(self):
return hash((self.i, self.k, self.j))
Your first question is already answered by Pavel Anossov.
But you have another question:
but how to save order ? Sets not support order. Can i use list instead set for example ?
You can use a list, but there are a few downsides:
You get the wrong interface.
You don't get automatic handling of duplicates. You have to explicitly write if foo not in res: res.append(foo). Obviously, you can wrap this up in a function instead of writing it repeatedly, but it's still extra work.
It's going to be a lot less efficient if the collection can get large. Basically, adding a new element, checking whether an element already exists, etc. are all going to be O(N) instead of O(1).
What you want is something that works like an ordered set. Or, equivalently, like a list that doesn't allow duplicates.
If you do all your adds first, and then all your lookups, and you don't need lookups to be fast, you can get around this by first building a list, then using unique_everseen from the itertools recipes to remove duplicates.
Or you could just keep a set and a list or elements by order (or a list plus a set of elements seen so far). But that can get a bit complicated, so you might want to wrap it up.
Ideally, you want to wrap it up in a type that has exactly the same API as set. Something like an OrderedSet akin to collections.OrderedDict.
Fortunately, if you scroll to the bottom of that docs page, you'll see that exactly what you want already exists; there's a link to an OrderedSet recipe at ActiveState.
So, copy it, paste it into your code, then just change res = set() to res = OrderedSet(), and you're done.
I think you can easily do what you want with a list as you asked in your first post since you defined the eq operator :
l = []
if Test(0) not in l :
l.append(Test(0))
My 2 cts ...
Pavel Anossov's answer is great for allowing your class to be used in a set with the semantics you want. However, if you want to preserve the order of your items, you'll need a bit more. Here's a function that de-duplicates a list, as long as the list items are hashable:
def dedupe(lst):
seen = set()
results = []
for item in lst:
if item not in seen:
seen.add(item)
results.append(item)
return results
A slightly more idiomatic version would be a generator, rather than a function that returns a list. This gets rid of the results variable, using yield rather than appending the unique values to it. I've also renamed the lst parameter to iterable, since it will work just as well on any iterable object (such as another generator).
def dedupe(iterable):
seen = set()
for item in iterable:
if item not in seen:
seen.add(item)
yield item

Is it possible to implement a Python for range loop without an iterator variable?

Is it possible to do following without the i?
for i in range(some_number):
# do something
If you just want to do something N amount of times and don't need the iterator.
Off the top of my head, no.
I think the best you could do is something like this:
def loop(f,n):
for i in xrange(n): f()
loop(lambda: <insert expression here>, 5)
But I think you can just live with the extra i variable.
Here is the option to use the _ variable, which in reality, is just another variable.
for _ in range(n):
do_something()
Note that _ is assigned the last result that returned in an interactive python session:
>>> 1+2
3
>>> _
3
For this reason, I would not use it in this manner. I am unaware of any idiom as mentioned by Ryan. It can mess up your interpreter.
>>> for _ in xrange(10): pass
...
>>> _
9
>>> 1+2
3
>>> _
9
And according to Python grammar, it is an acceptable variable name:
identifier ::= (letter|"_") (letter | digit | "_")*
You may be looking for
for _ in itertools.repeat(None, times): ...
this is THE fastest way to iterate times times in Python.
The general idiom for assigning to a value that isn't used is to name it _.
for _ in range(times):
do_stuff()
What everyone suggesting you to use _ isn't saying is that _ is frequently used as a shortcut to one of the gettext functions, so if you want your software to be available in more than one language then you're best off avoiding using it for other purposes.
import gettext
gettext.bindtextdomain('myapplication', '/path/to/my/language/directory')
gettext.textdomain('myapplication')
_ = gettext.gettext
# ...
print _('This is a translatable string.')
Here's a random idea that utilizes (abuses?) the data model (Py3 link).
class Counter(object):
def __init__(self, val):
self.val = val
def __nonzero__(self):
self.val -= 1
return self.val >= 0
__bool__ = __nonzero__ # Alias to Py3 name to make code work unchanged on Py2 and Py3
x = Counter(5)
while x:
# Do something
pass
I wonder if there is something like this in the standard libraries?
You can use _11 (or any number or another invalid identifier) to prevent name-colision with gettext. Any time you use underscore + invalid identifier you get a dummy name that can be used in for loop.
May be answer would depend on what problem you have with using iterator?
may be use
i = 100
while i:
print i
i-=1
or
def loop(N, doSomething):
if not N:
return
print doSomething(N)
loop(N-1, doSomething)
loop(100, lambda a:a)
but frankly i see no point in using such approaches
Instead of an unneeded counter, now you have an unneeded list.
Best solution is to use a variable that starts with "_", that tells syntax checkers that you are aware you are not using the variable.
x = range(5)
while x:
x.pop()
print "Work!"
I generally agree with solutions given above. Namely with:
Using underscore in for-loop (2 and more lines)
Defining a normal while counter (3 and more lines)
Declaring a custom class with __nonzero__ implementation (many more lines)
If one is to define an object as in #3 I would recommend implementing protocol for with keyword or apply contextlib.
Further I propose yet another solution. It is a 3 liner and is not of supreme elegance, but it uses itertools package and thus might be of an interest.
from itertools import (chain, repeat)
times = chain(repeat(True, 2), repeat(False))
while next(times):
print 'do stuff!'
In these example 2 is the number of times to iterate the loop. chain is wrapping two repeat iterators, the first being limited but the second is infinite. Remember that these are true iterator objects, hence they do not require infinite memory. Obviously this is much slower then solution #1. Unless written as a part of a function it might require a clean up for times variable.
We have had some fun with the following, interesting to share so:
class RepeatFunction:
def __init__(self,n=1): self.n = n
def __call__(self,Func):
for i in xrange(self.n):
Func()
return Func
#----usage
k = 0
#RepeatFunction(7) #decorator for repeating function
def Job():
global k
print k
k += 1
print '---------'
Job()
Results:
0
1
2
3
4
5
6
---------
7
If do_something is a simple function or can be wrapped in one, a simple map() can do_something range(some_number) times:
# Py2 version - map is eager, so it can be used alone
map(do_something, xrange(some_number))
# Py3 version - map is lazy, so it must be consumed to do the work at all;
# wrapping in list() would be equivalent to Py2, but if you don't use the return
# value, it's wastefully creating a temporary, possibly huge, list of junk.
# collections.deque with maxlen 0 can efficiently run a generator to exhaustion without
# storing any of the results; the itertools consume recipe uses it for that purpose.
from collections import deque
deque(map(do_something, range(some_number)), 0)
If you want to pass arguments to do_something, you may also find the itertools repeatfunc recipe reads well:
To pass the same arguments:
from collections import deque
from itertools import repeat, starmap
args = (..., my args here, ...)
# Same as Py3 map above, you must consume starmap (it's a lazy generator, even on Py2)
deque(starmap(do_something, repeat(args, some_number)), 0)
To pass different arguments:
argses = [(1, 2), (3, 4), ...]
deque(starmap(do_something, argses), 0)
We can use the while & yield, we can create our own loop function like this. Here you can refer to the official documentation.
def my_loop(start,n,step = 1):
while start < n:
yield start
start += step
for x in my_loop(0,15):
print(x)
#Return first n items of the iterable as a list
list(itertools.islice(iterable, n))
Taken from http://docs.python.org/2/library/itertools.html
If you really want to avoid putting something with a name (either an iteration variable as in the OP, or unwanted list or unwanted generator returning true the wanted amount of time) you could do it if you really wanted:
for type('', (), {}).x in range(somenumber):
dosomething()
The trick that's used is to create an anonymous class type('', (), {}) which results in a class with empty name, but NB that it is not inserted in the local or global namespace (even if a nonempty name was supplied). Then you use a member of that class as iteration variable which is unreachable since the class it's a member of is unreachable.
What about:
while range(some_number):
#do something

Categories

Resources