Python 3 generator comprehension to generate chunks including last - python

If you have a list in Python 3.7:
>>> li
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
You can turn that into a list of chunks each of length n with one of two common Python idioms:
>>> n=3
>>> list(zip(*[iter(li)]*n))
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
Which drops the last incomplete tuple since (9,10) is not length n
You can also do:
>>> [li[i:i+n] for i in range(0,len(li),n)]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
if you want the last sub list even if it has less than n elements.
Suppose now I have a generator, gen, unknown length or termination (so calling list(gen)) or sum(1 for _ in gen) would not be wise) where I want every chunk.
The best generator expression that I have been able to come up with is something along these lines:
from itertools import zip_longest
sentinel=object() # for use in filtering out ending chunks
gen=(e for e in range(22)) # fill in for the actual gen
g3=(t if sentinel not in t else tuple(filter(lambda x: x != sentinel, t)) for t in zip_longest(*[iter(gen)]*n,fillvalue=sentinel))
That works for the intended purpose:
>>> next(g3)
(0, 1, 2)
>>> next(g3)
(3, 4, 5)
>>> list(g3)
[(6, 7, 8), (9, 10)]
It just seems -- clumsy. I tried:
using islice but the lack of length seems hard to surmount;
using a sentinel in iter but the sentinel version of iter requires a callable, not an iterable.
Is there a more idiomatic Python 3 technique for a generator of chunks of length n including the last chuck that might be less than n?
I am open to a generator function as well. I am looking for something idiomatic and mostly more readable.
Update:
DSM's method in his deleted answer is very good I think:
>>> g3=(iter(lambda it=iter(gen): tuple(islice(it, n)), ()))
>>> next(g3)
(0, 1, 2)
>>> list(g3)
[(3, 4, 5), (6, 7, 8), (9, 10)]
I am open to this question being a dup but the linked question is almost 10 years old and focused on a list. There is no new method in Python 3 with generators where you don't know the length and don't want any more than a chunk at a time?

I think this is always going to be messy as long as you're trying to fit this into a one liner.
I would just bite the bullet and go with a generator function here. Especially useful if you don't know the actual size (say, if gen is an infinite generator, etc).
from itertools import islice
def chunk(gen, k):
"""Efficiently split `gen` into chunks of size `k`.
Args:
gen: Iterator to chunk.
k: Number of elements per chunk.
Yields:
Chunks as a list.
"""
while True:
chunk = [*islice(gen, 0, k)]
if chunk:
yield chunk
else:
break
>>> gen = iter(list(range(11)))
>>> list(chunk(gen))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
Someone may have a better suggestion, but this is how I'd do it.

This feels like a pretty reasonable approach that builds just on itertools.
>>> g = (i for i in range(10))
>>> g3 = takewhile(lambda x: x, (list(islice(g,3)) for _ in count(0)))
>>> list(g3)
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

I have put together some timings for the answers here.
The way I originally wrote it is actually the fastest on Python 3.7. For a one liner, that is likely the best.
A modified version of cold speed's answer is both fast and Pythonic and readable.
The other answers are all similar speed.
The benchmark:
from __future__ import print_function
try:
from itertools import zip_longest, takewhile, islice, count
except ImportError:
from itertools import takewhile, islice, count
from itertools import izip_longest as zip_longest
from collections import deque
def f1(it,k):
sentinel=object()
for t in (t if sentinel not in t else tuple(filter(lambda x: x != sentinel, t)) for t in zip_longest(*[iter(it)]*k, fillvalue=sentinel)):
yield t
def f2(it,k):
for t in (iter(lambda it=iter(it): tuple(islice(it, k)), ())):
yield t
def f3(it,k):
while True:
chunk = (*islice(it, 0, k),) # tuple(islice(it, 0, k)) if Python < 3.5
if chunk:
yield chunk
else:
break
def f4(it,k):
for t in takewhile(lambda x: x, (tuple(islice(it,k)) for _ in count(0))):
yield t
if __name__=='__main__':
import timeit
def tf(f, k, x):
data=(y for y in range(x))
return deque(f(data, k), maxlen=3)
k=3
for f in (f1,f2,f3,f4):
print(f.__name__, tf(f,k,100000))
for case, x in (('small',10000),('med',100000),('large',1000000)):
print("Case {}, {:,} x {}".format(case,x,k))
for f in (f1,f2,f3,f4):
print(" {:^10s}{:.4f} secs".format(f.__name__, timeit.timeit("tf(f, k, x)", setup="from __main__ import f, tf, x, k", number=10)))
And the results:
f1 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)
f2 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)
f3 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)
f4 deque([(99993, 99994, 99995), (99996, 99997, 99998), (99999,)], maxlen=3)
Case small, 10,000 x 3
f1 0.0125 secs
f2 0.0231 secs
f3 0.0185 secs
f4 0.0250 secs
Case med, 100,000 x 3
f1 0.1239 secs
f2 0.2270 secs
f3 0.1845 secs
f4 0.2477 secs
Case large, 1,000,000 x 3
f1 1.2140 secs
f2 2.2431 secs
f3 1.7967 secs
f4 2.4697 secs

This solution with a generator function is fairly explicit and short:
def g3(seq):
it = iter(seq)
while True:
head = list(itertools.islice(it, 3))
if head:
yield head
else:
break

The itertools recipe section of the doc offers various generator helpers.
Here you can modify take with the second form of iter to create a chunk generator.
from itertools import islice
def chunks(n, it):
it = iter(it)
return iter(lambda: tuple(islice(it, n)), ())
Example
li = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
print(*chunks(3, li))
Output
(0, 1, 2) (3, 4, 5) (6, 7, 8) (9, 10)

more_itertools.chunked:
list(more_itertools.chunked(range(11), 3))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]
See also the source:
iter(functools.partial(more_itertools.take, n, iter(iterable)), [])

My attempt using groupby and cycle. With cycle you can choose a pattern how to group your elements, so it's versatile:
from itertools import groupby, cycle
gen=(e for e in range(11))
d = [list(g) for d, g in groupby(gen, key=lambda v, c=cycle('000111'): next(c))]
print([v for v in d])
Outputs:
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

we can do this by using grouper function given in itertools documentation page.
from itertools import zip_longest
def grouper(iterable, n, fillvalue=None):
"Collect data into fixed-length chunks or blocks"
# grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx
args = [iter(iterable)] * n
return zip_longest(fillvalue=fillvalue, *args)
def out_iterator(lst):
for each in grouper(lst,n):
if None in each:
yield each[:each.index(None)]
else:
yield each
a=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
n=3
print(list(out_iterator(a)))
Output:
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10)]

Related

Accessing consecutive items when using a generator

Lets say I have a tuple generator, which I simulate as follows:
g = (x for x in (1,2,3,97,98,99))
For this specific generator, I wish to write a function to output the following:
(1,2,3)
(2,3,97)
(3,97,98)
(97,98,99)
(98,99)
(99)
So I'm iterating over three consecutive items at a time and printing them, except when I approach the end.
Should the first line in my function be:
t = tuple(g)
In other words, is it best to work on a tuple directly or might it be beneficial to work with a generator. If it is possible to approach this problem using both methods, please state the benefits and disadvantages for both approaches. Also, if it might be wise to use the generator approach, how might such a solution look?
Here's what I currently do:
def f(data, l):
t = tuple(data)
for j in range(len(t)):
print(t[j:j+l])
data = (x for x in (1,2,3,4,5))
f(data,3)
UPDATE:
Note that I've updated my function to take a second argument specifying the length of the window.
A specific example for returning three items could read
def yield3(gen):
b, c = gen.next(), gen.next()
try:
while True:
a, b, c = b, c, gen.next()
yield (a, b, c)
except StopIteration:
yield (b, c)
yield (c,)
g = (x for x in (1,2,3,97,98,99))
for l in yield3(g):
print l
Actually there're functions for this in itertools module - tee() and izip_longest():
>>> from itertools import izip_longest, tee
>>> g = (x for x in (1,2,3,97,98,99))
>>> a, b, c = tee(g, 3)
>>> next(b, None)
>>> next(c, None)
>>> next(c, None)
>>> [[x for x in l if x is not None] for l in izip_longest(a, b, c)]
[(1, 2, 3), (2, 3, 97), (3, 97, 98), (97, 98, 99), (98, 99), (99)]
from documentation:
Return n independent iterators from a single iterable. Equivalent to:
def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)
If you might need to take more than three elements at a time, and you don't want to load the whole generator into memory, I suggest using a deque from the collections module in the standard library to store the current set of items. A deque (pronounced "deck" and meaning "double-ended queue") can have values pushed and popped efficiently from both ends.
from collections import deque
from itertools import islice
def get_tuples(gen, n):
q = deque(islice(gen, n)) # pre-load the queue with `n` values
while q: # run until the queue is empty
yield tuple(q) # yield a tuple copied from the current queue
q.popleft() # remove the oldest value from the queue
try:
q.append(next(gen)) # try to add a new value from the generator
except StopIteration:
pass # but we don't care if there are none left
actually it depends.
A generator might be useful in case of very large collections, where you dont really need to store them all in memory to achieve the result you want.
On the other hand, you have to print it is seems safe to guess that the collection isn't huge, so it doesn make a difference.
However, this is a generator that achieve what you were looking for
def part(gen, size):
t = tuple()
try:
while True:
l = gen.next()
if len(t) < size:
t = t + (l,)
if len(t) == size:
yield t
continue
if len(t) == size:
t = t[1:] + (l,)
yield t
continue
except StopIteration:
while len(t) > 1:
t = t[1:]
yield t
>>> a = (x for x in range(10))
>>> list(part(a, 3))
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9), (9,)]
>>> a = (x for x in range(10))
>>> list(part(a, 5))
[(0, 1, 2, 3, 4), (1, 2, 3, 4, 5), (2, 3, 4, 5, 6), (3, 4, 5, 6, 7), (4, 5, 6, 7, 8), (5, 6, 7, 8, 9), (6, 7, 8, 9), (7, 8, 9), (8, 9), (9,)]
>>>
note: the code actually isn't very elegant but it works also when you have to split in, say, 5 pieces
It's definitely best to work with the generator because you don't want to have to hold everything in memory.
It can be done very simply with a deque.
from collections import deque
from itertools import islice
def overlapping_chunks(size, iterable, *, head=False, tail=False):
"""
Get overlapping subsections of an iterable of a specified size.
print(*overlapping_chunks(3, (1,2,3,97,98,99)))
#>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]
If head is given, the "warm up" before the specified maximum
number of items is included.
print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
#>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]
If head is truthy, the "warm up" before the specified maximum
number of items is included.
print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
#>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]
If tail is truthy, the "cool down" after the iterable is exhausted
is included.
print(*overlapping_chunks(3, (1,2,3,97,98,99), tail=True))
#>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99] [98, 99] [99]
"""
chunker = deque(maxlen=size)
iterator = iter(iterable)
for item in islice(iterator, size-1):
chunker.append(item)
if head:
yield list(chunker)
for item in iterator:
chunker.append(item)
yield list(chunker)
if tail:
while len(chunker) > 1:
chunker.popleft()
yield list(chunker)
I think what you currently do seems a lot easier than any of the above. If there isn't any particular need to make it more complicated, my opinion would be to keep it simple. In other words, it is best to work on a tuple directly.
Here's a generator that works in both Python 2.7.17 and 3.8.1. Internally it uses iterators and generators whenever possible, so it should be relatively memory efficient.
try:
from itertools import izip, izip_longest, takewhile
except ImportError: # Python 3
izip = zip
from itertools import zip_longest as izip_longest, takewhile
def tuple_window(n, iterable):
iterators = [iter(iterable) for _ in range(n)]
for n, iterator in enumerate(iterators):
for _ in range(n):
next(iterator)
_NULL = object() # Unique singleton object.
for t in izip_longest(*iterators, fillvalue=_NULL):
yield tuple(takewhile(lambda v: v is not _NULL, t))
if __name__ == '__main__':
data = (1, 2, 3, 97, 98, 99)
for t in tuple_window(3, data):
print(t)
Output:
(1, 2, 3)
(2, 3, 97)
(3, 97, 98)
(97, 98, 99)
(98, 99)
(99,)

Multiply Adjacent Elements

I have a tuple of integers such as (1, 2, 3, 4, 5) and I want to produce the tuple (1*2, 2*3, 3*4, 4*5) by multiplying adjacent elements. Is it possible to do this with a one-liner?
Short and sweet. Remember that zip only runs as long as the shortest input.
print tuple(x*y for x,y in zip(t,t[1:]))
>>> t = (1, 2, 3, 4, 5)
>>> print tuple(t[i]*t[i+1] for i in range(len(t)-1))
(2, 6, 12, 20)
Not the most pythonic of solutions though.
I like the recipes from itertools:
from itertools import izip, tee
def pairwise(iterable):
xs, ys = tee(iterable)
next(ys)
return izip(xs, ys)
print [a * b for a, b in pairwise(range(10))]
Result:
[0, 2, 6, 12, 20, 30, 42, 56, 72]
If t is your tuple:
>>> tuple(t[x]*t[x+1] for x in range(len(t)-1))
(2, 6, 12, 20)
And another solution with lovely map:
>>> tuple(map(lambda x,y:x*y, t[1:], t[:-1]))
(2, 6, 12, 20)
Edit:
And if you worry about the extra memory consuption of the slices, you can use islice from itertools, which will iterate over your tuple(thx #eyquem):
>>> tuple(map(lambda x,y:x*y, islice(t, 1, None), islice(t, 0, len(t)-1)))
(2, 6, 12, 20)
tu = (1, 2, 3, 4, 5)
it = iter(tu).next
it()
print tuple(a*it() for a in tu)
I timed various code:
from random import choice
from time import clock
from itertools import izip
tu = tuple(choice(range(0,87)) for i in xrange(2000))
A,B,C,D = [],[],[],[]
for n in xrange(50):
rentime = 100
te = clock()
for ren in xrange(rentime): # indexing
tuple(tu[x]*tu[x+1] for x in range(len(tu)-1))
A.append(clock()-te)
te = clock()
for ren in xrange(rentime): # zip
tuple(x*y for x,y in zip(tu,tu[1:]))
B.append(clock()-te)
te = clock()
for ren in xrange(rentime): #i ter
it = iter(tu).next
it()
tuple(a*it() for a in tu)
C.append(clock()-te)
te = clock()
for ren in xrange(rentime): # izip
tuple(x*y for x,y in izip(tu,tu[1:]))
D.append(clock()-te)
print 'indexing ',min(A)
print 'zip ',min(B)
print 'iter ',min(C)
print 'izip ',min(D)
result
indexing 0.135054036197
zip 0.134594201218
iter 0.100380634969
izip 0.0923947037962
izip is better than zip : - 31 %
My solution isn't so bad (I didn't think so by the way): -25 % relatively to zip, 10 % more time than champion izip
I'm surprised that indexing isn't faster than zip : nneonneo is right, zip is acceptable

Cartesian product of large iterators (itertools)

From a previous question I learned something interesting. If Python's itertools.product is fed a series of iterators, these iterators will be converted into tuples before the Cartesian product begins. Related questions look at the source code of itertools.product to conclude that, while no intermediate results are stored in memory, tuple versions of the original iterators are created before the product iteration begins.
Question: Is there a way to create an iterator to a Cartesian product when the (tuple converted) inputs are too large to hold in memory? Trivial example:
import itertools
A = itertools.permutations(xrange(100))
itertools.product(A)
A more practical use case would take in a series of (*iterables[, repeat]) like the original implementation of the function - the above is just an example. It doesn't look like you can use the current implementation of itertools.product, so I welcome in submission in pure python (though you can't beat the C backend of itertools!).
Here's an implementation which calls callables and iterates iterables, which are assumed restartable:
def product(*iterables, **kwargs):
if len(iterables) == 0:
yield ()
else:
iterables = iterables * kwargs.get('repeat', 1)
it = iterables[0]
for item in it() if callable(it) else iter(it):
for items in product(*iterables[1:]):
yield (item, ) + items
Testing:
import itertools
g = product(lambda: itertools.permutations(xrange(100)),
lambda: itertools.permutations(xrange(100)))
print next(g)
print sum(1 for _ in g)
Without "iterator recreation", it may be possible for the first of the factors. But that would save only 1/n space (where n is the number of factors) and add confusion.
So the answer is iterator recreation. A client of the function would have to ensure that the creation of the iterators is pure (no side-effects). Like
def iterProduct(ic):
if not ic:
yield []
return
for i in ic[0]():
for js in iterProduct(ic[1:]):
yield [i] + js
# Test
x3 = lambda: xrange(3)
for i in iterProduct([x3,x3,x3]):
print i
This can't be done with standard Python generators, because some of the iterables must be cycled through multiple times. You have to use some kind of datatype capable of "reiteration." I've created a simple "reiterable" class and a non-recursive product algorithm. product should have more error-checking, but this is at least a first approach. The simple reiterable class...
class PermutationsReiterable(object):
def __init__(self, value):
self.value = value
def __iter__(self):
return itertools.permutations(xrange(self.value))
And product iteslf...
def product(*reiterables, **kwargs):
if not reiterables:
yield ()
return
reiterables *= kwargs.get('repeat', 1)
iterables = [iter(ri) for ri in reiterables]
try:
states = [next(it) for it in iterables]
except StopIteration:
# outer product of zero-length iterable is empty
return
yield tuple(states)
current_index = max_index = len(iterables) - 1
while True:
try:
next_item = next(iterables[current_index])
except StopIteration:
if current_index > 0:
new_iter = iter(reiterables[current_index])
next_item = next(new_iter)
states[current_index] = next_item
iterables[current_index] = new_iter
current_index -= 1
else:
# last iterable has run out; terminate generator
return
else:
states[current_index] = next_item
current_index = max_index
yield tuple(states)
Tested:
>>> pi2 = PermutationsReiterable(2)
>>> list(pi2); list(pi2)
[(0, 1), (1, 0)]
[(0, 1), (1, 0)]
>>> list(product(pi2, repeat=2))
[((0, 1), (0, 1)), ((0, 1), (1, 0)), ((1, 0), (0, 1)), ((1, 0), (1, 0))]
>>> giant_product = product(PermutationsReiterable(100), repeat=5)
>>> len(list(itertools.islice(giant_product, 0, 5)))
5
>>> big_product = product(PermutationsReiterable(10), repeat=2)
>>> list(itertools.islice(big_product, 0, 5))
[((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 8, 9)),
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 7, 9, 8)),
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 8, 7, 9)),
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 8, 9, 7)),
((0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (0, 1, 2, 3, 4, 5, 6, 9, 7, 8))]
I'm sorry to up this topic but after spending hours debugging a program trying to iterate over recursively generated cartesian product of generators. I can tell you that none of the solutions above work if not working with constant numbers as in all the examples above.
Correction :
from itertools import tee
def product(*iterables, **kwargs):
if len(iterables) == 0:
yield ()
else:
iterables = iterables * kwargs.get('repeat', 1)
it = iterables[0]
for item in it() if callable(it) else iter(it):
iterables_tee = list(map(tee, iterables[1:]))
iterables[1:] = [it1 for it1, it2 in iterables_tee]
iterable_copy = [it2 for it1, it2 in iterables_tee]
for items in product(*iterable_copy):
yield (item, ) + items
If your generators contain generators, you need to pass a copy to the recursive call.

How to write a generator that returns ALL-BUT-LAST items in the iterable in Python?

I asked some similar questions [1, 2] yesterday and got great answers, but I am not yet technically skilled enough to write a generator of such sophistication myself.
How could I write a generator that would raise StopIteration if it's the last item, instead of yielding it?
I am thinking I should somehow ask two values at a time, and see if the 2nd value is StopIteration. If it is, then instead of yielding the first value, I should raise this StopIteration. But somehow I should also remember the 2nd value that I asked if it wasn't StopIteration.
I don't know how to write it myself. Please help.
For example, if the iterable is [1, 2, 3], then the generator should return 1 and 2.
Thanks, Boda Cydo.
[1] How do I modify a generator in Python?
[2] How to determine if the value is ONE-BUT-LAST in a Python generator?
This should do the trick:
def allbutlast(iterable):
it = iter(iterable)
current = it.next()
for i in it:
yield current
current = i
>>> list(allbutlast([1,2,3]))
[1, 2]
This will iterate through the entire list, and return the previous item so the last item is never returned.
Note that calling the above on both [] and [1] will return an empty list.
First off, is a generator really needed? This sounds like the perfect job for Python’s slices syntax:
result = my_range[ : -1]
I.e.: take a range form the first item to the one before the last.
the itertools module shows a pairwise() method in its recipes. adapting from this recipe, you can get your generator:
from itertools import *
def n_apart(iterable, n):
a,b = tee(iterable)
for count in range(n):
next(b)
return zip(a,b)
def all_but_n_last(iterable, n):
return (value for value,dummy in n_apart(iterable, n))
the n_apart() function return pairs of values which are n elements apart in the input iterable, ignoring all pairs . all_but_b_last() returns the first value of all pairs, which incidentally ignores the n last elements of the list.
>>> data = range(10)
>>> list(data)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(n_apart(data,3))
[(0, 3), (1, 4), (2, 5), (3, 6), (4, 7), (5, 8), (6, 9)]
>>> list(all_but_n_last(data,3))
[0, 1, 2, 3, 4, 5, 6]
>>>
>>> list(all_but_n_last(data,1))
[0, 1, 2, 3, 4, 5, 6, 7, 8]
The more_itertools project has a tool that emulates itertools.islice with support for negative indices:
import more_itertools as mit
list(mit.islice_extended([1, 2, 3], None, -1))
# [1, 2]
gen = (x for x in iterable[:-1])

Iteration over list slices

I want an algorithm to iterate over list slices. Slices size is set outside the function and can differ.
In my mind it is something like:
for list_of_x_items in fatherList:
foo(list_of_x_items)
Is there a way to properly define list_of_x_items or some other way of doing this using python 2.5?
edit1: Clarification Both "partitioning" and "sliding window" terms sound applicable to my task, but I am no expert. So I will explain the problem a bit deeper and add to the question:
The fatherList is a multilevel numpy.array I am getting from a file. Function has to find averages of series (user provides the length of series) For averaging I am using the mean() function. Now for question expansion:
edit2: How to modify the function you have provided to store the extra items and use them when the next fatherList is fed to the function?
for example if the list is lenght 10 and size of a chunk is 3, then the 10th member of the list is stored and appended to the beginning of the next list.
Related:
What is the most “pythonic” way to iterate over a list in chunks?
If you want to divide a list into slices you can use this trick:
list_of_slices = zip(*(iter(the_list),) * slice_size)
For example
>>> zip(*(iter(range(10)),) * 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
If the number of items is not dividable by the slice size and you want to pad the list with None you can do this:
>>> map(None, *(iter(range(10)),) * 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
It is a dirty little trick
OK, I'll explain how it works. It'll be tricky to explain but I'll try my best.
First a little background:
In Python you can multiply a list by a number like this:
[1, 2, 3] * 3 -> [1, 2, 3, 1, 2, 3, 1, 2, 3]
([1, 2, 3],) * 3 -> ([1, 2, 3], [1, 2, 3], [1, 2, 3])
And an iterator object can be consumed once like this:
>>> l=iter([1, 2, 3])
>>> l.next()
1
>>> l.next()
2
>>> l.next()
3
The zip function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. For example:
zip([1, 2, 3], [20, 30, 40]) -> [(1, 20), (2, 30), (3, 40)]
zip(*[(1, 20), (2, 30), (3, 40)]) -> [[1, 2, 3], [20, 30, 40]]
The * in front of zip used to unpack arguments. You can find more details here.
So
zip(*[(1, 20), (2, 30), (3, 40)])
is actually equivalent to
zip((1, 20), (2, 30), (3, 40))
but works with a variable number of arguments
Now back to the trick:
list_of_slices = zip(*(iter(the_list),) * slice_size)
iter(the_list) -> convert the list into an iterator
(iter(the_list),) * N -> will generate an N reference to the_list iterator.
zip(*(iter(the_list),) * N) -> will feed those list of iterators into zip. Which in turn will group them into N sized tuples. But since all N items are in fact references to the same iterator iter(the_list) the result will be repeated calls to next() on the original iterator
I hope that explains it. I advice you to go with an easier to understand solution. I was only tempted to mention this trick because I like it.
If you want to be able to consume any iterable you can use these functions:
from itertools import chain, islice
def ichunked(seq, chunksize):
"""Yields items from an iterator in iterable chunks."""
it = iter(seq)
while True:
yield chain([it.next()], islice(it, chunksize-1))
def chunked(seq, chunksize):
"""Yields items from an iterator in list chunks."""
for chunk in ichunked(seq, chunksize):
yield list(chunk)
Use a generator:
big_list = [1,2,3,4,5,6,7,8,9]
slice_length = 3
def sliceIterator(lst, sliceLen):
for i in range(len(lst) - sliceLen + 1):
yield lst[i:i + sliceLen]
for slice in sliceIterator(big_list, slice_length):
foo(slice)
sliceIterator implements a "sliding window" of width sliceLen over the squence lst, i.e. it produces overlapping slices: [1,2,3], [2,3,4], [3,4,5], ... Not sure if that is the OP's intention, though.
Do you mean something like:
def callonslices(size, fatherList, foo):
for i in xrange(0, len(fatherList), size):
foo(fatherList[i:i+size])
If this is roughly the functionality you want you might, if you desire, dress it up a bit in a generator:
def sliceup(size, fatherList):
for i in xrange(0, len(fatherList), size):
yield fatherList[i:i+size]
and then:
def callonslices(size, fatherList, foo):
for sli in sliceup(size, fatherList):
foo(sli)
Answer to the last part of the question:
question update: How to modify the
function you have provided to store
the extra items and use them when the
next fatherList is fed to the
function?
If you need to store state then you can use an object for that.
class Chunker(object):
"""Split `iterable` on evenly sized chunks.
Leftovers are remembered and yielded at the next call.
"""
def __init__(self, chunksize):
assert chunksize > 0
self.chunksize = chunksize
self.chunk = []
def __call__(self, iterable):
"""Yield items from `iterable` `self.chunksize` at the time."""
assert len(self.chunk) < self.chunksize
for item in iterable:
self.chunk.append(item)
if len(self.chunk) == self.chunksize:
# yield collected full chunk
yield self.chunk
self.chunk = []
Example:
chunker = Chunker(3)
for s in "abcd", "efgh":
for chunk in chunker(s):
print ''.join(chunk)
if chunker.chunk: # is there anything left?
print ''.join(chunker.chunk)
Output:
abc
def
gh
I am not sure, but it seems you want to do what is called a moving average. numpy provides facilities for this (the convolve function).
>>> x = numpy.array(range(20))
>>> x
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19])
>>> n = 2 # moving average window
>>> numpy.convolve(numpy.ones(n)/n, x)[n-1:-n+1]
array([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5,
9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5])
The nice thing is that it accomodates different weighting schemes nicely (just change numpy.ones(n) / n to something else).
You can find a complete material here:
http://www.scipy.org/Cookbook/SignalSmooth
Expanding on the answer of #Ants Aasma: In Python 3.7 the handling of the StopIteration exception changed (according to PEP-479). A compatible version would be:
from itertools import chain, islice
def ichunked(seq, chunksize):
it = iter(seq)
while True:
try:
yield chain([next(it)], islice(it, chunksize - 1))
except StopIteration:
return
Your question could use some more detail, but how about:
def iterate_over_slices(the_list, slice_size):
for start in range(0, len(the_list)-slice_size):
slice = the_list[start:start+slice_size]
foo(slice)
For a near-one liner (after itertools import) in the vein of Nadia's answer dealing with non-chunk divisible sizes without padding:
>>> import itertools as itt
>>> chunksize = 5
>>> myseq = range(18)
>>> cnt = itt.count()
>>> print [ tuple(grp) for k,grp in itt.groupby(myseq, key=lambda x: cnt.next()//chunksize%2)]
[(0, 1, 2, 3, 4), (5, 6, 7, 8, 9), (10, 11, 12, 13, 14), (15, 16, 17)]
If you want, you can get rid of the itertools.count() requirement using enumerate(), with a rather uglier:
[ [e[1] for e in grp] for k,grp in itt.groupby(enumerate(myseq), key=lambda x: x[0]//chunksize%2) ]
(In this example the enumerate() would be superfluous, but not all sequences are neat ranges like this, obviously)
Nowhere near as neat as some other answers, but useful in a pinch, especially if already importing itertools.
A function that slices a list or an iterator into chunks of a given size. Also handles the case correctly if the last chunk is smaller:
def slice_iterator(data, slice_len):
it = iter(data)
while True:
items = []
for index in range(slice_len):
try:
item = next(it)
except StopIteration:
if items == []:
return # we are done
else:
break # exits the "for" loop
items.append(item)
yield items
Usage example:
for slice in slice_iterator([1,2,3,4,5,6,7,8,9,10],3):
print(slice)
Result:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]

Categories

Resources