s = [1,2,3,4,5,6,7,8,9]
n = 3
list(zip(*[iter(s)]*n)) # returns [(1,2,3),(4,5,6),(7,8,9)]
How does zip(*[iter(s)]*n) work? What would it look like if it was written with more verbose code?
iter() is an iterator over a sequence. [x] * n produces a list containing n quantity of x, i.e. a list of length n, where each element is x. *arg unpacks a sequence into arguments for a function call. Therefore you're passing the same iterator 3 times to zip(), and it pulls an item from the iterator each time.
x = iter([1,2,3,4,5,6,7,8,9])
print(list(zip(x, x, x)))
The other great answers and comments explain well the roles of argument unpacking and zip().
As Ignacio and ujukatzel say, you pass to zip() three references to the same iterator and zip() makes 3-tuples of the integers—in order—from each reference to the iterator:
1,2,3,4,5,6,7,8,9 1,2,3,4,5,6,7,8,9 1,2,3,4,5,6,7,8,9
^ ^ ^
^ ^ ^
^ ^ ^
And since you ask for a more verbose code sample:
chunk_size = 3
L = [1,2,3,4,5,6,7,8,9]
# iterate over L in steps of 3
for start in range(0,len(L),chunk_size): # xrange() in 2.x; range() in 3.x
end = start + chunk_size
print L[start:end] # three-item chunks
Following the values of start and end:
[0:3) #[1,2,3]
[3:6) #[4,5,6]
[6:9) #[7,8,9]
FWIW, you can get the same result with map() with an initial argument of None:
>>> map(None,*[iter(s)]*3)
[(1, 2, 3), (4, 5, 6), (7, 8, 9)]
For more on zip() and map(): http://muffinresearch.co.uk/archives/2007/10/16/python-transposing-lists-with-map-and-zip/
I think one thing that's missed in all the answers (probably obvious to those familiar with iterators) but not so obvious to others is -
Since we have the same iterator, it gets consumed and the remaining elements are used by the zip. So if we simply used the list and not the iter
eg.
l = range(9)
zip(*([l]*3)) # note: not an iter here, the lists are not emptied as we iterate
# output
[(0, 0, 0), (1, 1, 1), (2, 2, 2), (3, 3, 3), (4, 4, 4), (5, 5, 5), (6, 6, 6), (7, 7, 7), (8, 8, 8)]
Using iterator, pops the values and only keeps remaining available, so for zip once 0 is consumed 1 is available and then 2 and so on. A very subtle thing, but quite clever!!!
iter(s) returns an iterator for s.
[iter(s)]*n makes a list of n times the same iterator for s.
So, when doing zip(*[iter(s)]*n), it extracts an item from all the three iterators from the list in order. Since all the iterators are the same object, it just groups the list in chunks of n.
One word of advice for using zip this way. It will truncate your list if it's length is not evenly divisible. To work around this you could either use itertools.izip_longest if you can accept fill values. Or you could use something like this:
def n_split(iterable, n):
num_extra = len(iterable) % n
zipped = zip(*[iter(iterable)] * n)
return zipped if not num_extra else zipped + [iterable[-num_extra:], ]
Usage:
for ints in n_split(range(1,12), 3):
print ', '.join([str(i) for i in ints])
Prints:
1, 2, 3
4, 5, 6
7, 8, 9
10, 11
Unwinding layers of "cleverness", you may find this equivalent spelling easier to follow:
x = iter(s)
for a, b, c in zip(*([x] * n)):
print(a, b, c)
which is, in turn, equivalent to the even less-clever:
x = iter(accounts_iter)
for a, b, c in zip(x, x, x):
print(a, b, c)
Now it should start to become clear. There is only a single iterator object, x. On each iteration, zip(), under the covers, calls next(x) 3 times, once for each iterator object passed to it. But it's the same iterator object here each time. So it delivers the first 3 next(x) results, and leaves the shared iterator object waiting to deliver its 4th result next. Lather, rinse, repeat.
BTW, I suspect you're parsing *([iter(x)]*n) incorrectly in your head. The trailing *n happens first, and then the prefix * is applied to the n-element list *n created. f(*iterable) is a shortcut for calling f() with a variable number of arguments, one for each object iterable delivers.
I needed to break down each partial step to really internalize how it is working. My notes from the REPL:
>>> # refresher on using list multiples to repeat item
>>> lst = list(range(15))
>>> lst
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
>>> # lst id value
>>> id(lst)
139755081359872
>>> [id(x) for x in [lst]*3]
[139755081359872, 139755081359872, 139755081359872]
# replacing lst with an iterator of lst
# It's the same iterator three times
>>> [id(x) for x in [iter(lst)]*3 ]
[139755085005296, 139755085005296, 139755085005296]
# without starred expression zip would only see single n-item list.
>>> print([iter(lst)]*3)
[<list_iterator object at 0x7f1b440837c0>, <list_iterator object at 0x7f1b440837c0>, <list_iterator object at 0x7f1b440837c0>]
# Must use starred expression to expand n arguments
>>> print(*[iter(lst)]*3)
<list_iterator object at 0x7f1b4418b1f0> <list_iterator object at 0x7f1b4418b1f0> <list_iterator object at 0x7f1b4418b1f0>
# by repeating the same iterator, n-times,
# each pass of zip will call the same iterator.__next__() n times
# this is equivalent to manually calling __next__() until complete
>>> iter_lst = iter(lst)
>>> ((iter_lst.__next__(), iter_lst.__next__(), iter_lst.__next__()))
(0, 1, 2)
>>> ((iter_lst.__next__(), iter_lst.__next__(), iter_lst.__next__()))
(3, 4, 5)
>>> ((iter_lst.__next__(), iter_lst.__next__(), iter_lst.__next__()))
(6, 7, 8)
>>> ((iter_lst.__next__(), iter_lst.__next__(), iter_lst.__next__()))
(9, 10, 11)
>>> ((iter_lst.__next__(), iter_lst.__next__(), iter_lst.__next__()))
(12, 13, 14)
>>> ((iter_lst.__next__(), iter_lst.__next__(), iter_lst.__next__()))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
# all together now!
# continuing with same iterator multiple times in list
>>> print(*[iter(lst)]*3)
<list_iterator object at 0x7f1b4418b1f0> <list_iterator object at 0x7f1b4418b1f0> <list_iterator object at 0x7f1b4418b1f0>
>>> zip(*[iter(lst)]*3)
<zip object at 0x7f1b43f14e00>
>>> list(zip(*[iter(lst)]*3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 14)]
# NOTE: must use list multiples. Explicit listing creates 3 unique iterators
>>> [iter(lst)]*3 == [iter(lst), iter(lst), iter(lst)]
False
>>> list(zip(*[[iter(lst), iter(lst), iter(lst)]))
[(0, 0, 0), (1, 1, 1), (2, 2, 2), (3, 3, 3), ....
It is probably easier to see what is happening in python interpreter or ipython with n = 2:
In [35]: [iter("ABCDEFGH")]*2
Out[35]: [<iterator at 0x6be4128>, <iterator at 0x6be4128>]
So, we have a list of two iterators which are pointing to the same iterator object. Remember that iter on a object returns an iterator object and in this scenario, it is the same iterator twice due to the *2 python syntactic sugar. Iterators also run only once.
Further, zip takes any number of iterables (sequences are iterables) and creates tuple from i'th element of each of the input sequences. Since both iterators are identical in our case, zip moves the same iterator twice for each 2-element tuple of output.
In [41]: help(zip)
Help on built-in function zip in module __builtin__:
zip(...)
zip(seq1 [, seq2 [...]]) -> [(seq1[0], seq2[0] ...), (...)]
Return a list of tuples, where each tuple contains the i-th element
from each of the argument sequences. The returned list is truncated
in length to the length of the shortest argument sequence.
The unpacking (*) operator ensures that the iterators run to exhaustion which in this case is until there is not enough input to create a 2-element tuple.
This can be extended to any value of n and zip(*[iter(s)]*n) works as described.
x = [1,2,3,4,5,6,7,8,9]
zip(*[iter(x)] * 3)
is the same as:
x = [1,2,3,4,5,6,7,8,9]
iter_var = iter(x)
zip(iter_var,iter_var,iter_var)
Each time zip() gets the next value in iter_var it moves to the next value of x.
Try running next(iter_var) to see how this works.
Related
How are *zip and *D.items() working in the following code?
What does * operator do?
import matplotlib.pyplot as plt
D = {u'Label1':26, u'Label2': 17, u'Label3':30}
plt.bar(*zip(*D.items()))
plt.show()
I would like to add to #TrentonMcKinney's answer some examples of how unpacking with * works:
>>> l = [1, 2, 3]
>>> print(l) # == print([1, 2, 3])
[1, 2, 3]
>>> print(*l) # == print(1, 2, 3)
1 2 3
Basically print(l) was converted to print([1, 2, 3]) and then the output was the list itself [1, 2, 3].
On the other hand, print(*l) was converted to print(1, 2, 3), this is, the list was unpacked an each element was passed as a separate argument to print, so the output was 1 2 3.
So in your example, the inner * is unpacking each of the items tuples (('Label1', 26), ('Label2', 17) and ('Label3', 30)) and passing them as a separate arguments to zip (zip(('Label1', 26), ('Label2', 17), ('Label3', 30))). That returns another list-like object equivalent to trasposing those sequences, this is, groups all the first elements of every iterable together, the second elements together, ... ([('Label1', 'Label2', 'Label3'), (26, 17, 30)]). The outter * unpacks them out of the list-like object to pass each tuple to the plt.bar function(plt.bar(('Label1', 'Label2', 'Label3'), (26, 17, 30))).
Similar to * to unpack sequences, there is a ** that is used to unpack mappings into keyword arguments. f(**{'a': 1, 'b': 2}) is the same as f(a=1, b=2).
The asterisk operator as it relates to your code:
An iterable is something that can be iterated though
* unpacks an iterable
D.items() creates a dict_items object, a list of tuples. See .items() and dictionary view objects
dict_items([('Label1', 26), ('Label2', 17), ('Label3', 30)])
The * in zip(*D.items()), unpacks the dict_times object, and zip, aggregates elements from each of the iterables.
('Label1', 'Label2', 'Label3')
(26, 17, 30)
zip creates a generator (another iterable), like <zip at 0x1c9f2248e00>, which is an iterable of tuples.
The * in *zip unpacks the generator, so the plot API has access to the two tuples.
import matplotlib.pyplot as plt
D = {u'Label1':26, u'Label2': 17, u'Label3':30}
plt.bar(*zip(*D.items()))
The asterisk operator in broader terms:
PEP 448 - Additional Unpacking Generalizations
Real Python: Python args and kwargs: Demystified
SO: What does asterisk * mean in Python?
Trey Hunner: Asterisks in Python: what they are and how to use them
I have found below code in web, result is tuple of two elements in list, how to understand [iter(list)]*2?
lst = [1,2,3,4,5,6,7,8]
b=zip(*[iter(lst)]*2)
list(b)
[(1, 2), (3, 4), (5, 6), (7, 8)]
------------
[iter(lst)]*2
[<list_iterator at 0x1aff33917f0>, <list_iterator at 0x1aff33917f0>]
I check [iter(lst)]*2, same iterator above, so meaning iter repeat double,
so, if i check num from 2 to 3, result should be [(1, 2, 3), (4, 5, 6),(7,8,NaN)]
but delete 7,8
lst = [1,2,3,4,5,6,7,8]
b=zip(*[iter(lst)]*3)
list(b)
--------------
[(1, 2, 3), (4, 5, 6)]
Quite a tricky construct to explain. I'll give it a shot:
with [iter(lst)] you create a list with with one item. The item is an iterator over a list.
whenever python tries to get an element from this iterator, then the next element of lst is returned until no more element is available.
Just try following:
i = iter(lst)
next(i)
next(i)
the output should look like:
>>> lst = [1,2,3,4,5,6,7,8]
>>> i = iter(lst)
>>> next(i)
1
>>> next(i)
2
>>> next(i)
3
>>> next(i)
4
>>> next(i)
5
>>> next(i)
6
>>> next(i)
7
>>> next(i)
8
>>> next(i)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
Now you create a list that contains twice exactly the same iterator.
You do this with
itlst = [iter(lst)] * 2
try out following:
itlst1 = [iter(lst)] * 2
itlst2 = [iter(lst), iter(lst)]
print(itlst1)
print(itlst2)
The result will look something like:
>>> itlst1 = [iter(lst)] * 2
>>> itlst2 = [iter(lst), iter(lst)]
>>> print(itlst1)
[<list_iterator object at 0x7f9251172b00>, <list_iterator object at 0x7f9251172b00>]
>>> print(itlst2)
[<list_iterator object at 0x7f9251172b70>, <list_iterator object at 0x7f9251172ba8>]
What is important to notice is, that itlst1 is a list containing twice the same iterator, whereas itlst2 contains two different iterators.
to illustrate try to type:
next(itlst1[0])
next(itlst1[1])
next(itlst1[0])
next(itlst1[1])
and compare it with:
next(itlst2[0])
next(itlst2[1])
next(itlst2[0])
next(itlst2[1])
The result is:
>>> next(itlst1[0])
1
>>> next(itlst1[1])
2
>>> next(itlst1[0])
3
>>> next(itlst1[1])
4
>>>
>>> next(itlst2[0])
1
>>> next(itlst2[1])
1
>>> next(itlst2[0])
2
>>> next(itlst2[1])
2
Now to the zip() function ( https://docs.python.org/3/library/functions.html#zip ):
Try following:
i = iter(lst)
list(zip(i, i))
zip() with two parameters.
Whenver you try to get the next element from zip it will do following:
get one value from the iterable that is the first parameter
get one value from the iterable that is the second parameter
return a tuple with these two values.
list(zip(xxx)) will do this repeatedly and store the result in a list.
The result will be:
>>> i = iter(lst)
>>> list(zip(i, i))
[(1, 2), (3, 4), (5, 6), (7, 8)]
The next trick being used is the * that is used to use the first element as first parameter to a function call, the second element as second parameter and so forth) What does ** (double star/asterisk) and * (star/asterisk) do for parameters?
so writing:
itlst1 = [iter(lst)] * 2
list(zip(*itlst1))
is in this case identical to
i = iter(lst)
itlst1 = [i] * 2
list(zip(itlst1[0], itlst1[1]))
which is identical to
list(zip(i, i))
which I explained already.
Hope this explains most of the above tricks.
iter(lst) turns a list into an iterator. Iterators let you step lazily through an iterable by calling next() until the iterator runs out of items.
[iter(lst)] puts the iterator into a single-element list.
[iter(lst)] * 2 makes 2 copies of the iterator in the list, giving
it = iter(lst)
[it, it]
Both list elements are aliases of the same underlying iterator object, so whenever next() is called on either of the iterators as zip exhausts them, successive elements are yielded.
*[...] unpacks the list of the two copies of the same iterator into the arguments for zip. This creates a zip object that lets you iterate through tuples of elements from each of its arguments.
list(...) iterates through the zip object and copies the elements into a list. Since both zipped iterators point to the same underlying iterator, we get the sequential elements seen in your output.
Without using the iterator alias, you'd get
>>> list(zip(iter(lst), iter(lst)))
[(1, 1), (2, 2), (3, 3), (4, 4), (5, 5), (6, 6), (7, 7), (8, 8)]
A similar way to write list(zip(*[iter(lst)] * 2)) is list(zip(lst[::2], lst[1::2])), which seems a bit less magical (if much less performant).
The explanation for
>>> list(zip(*[iter(lst)] * 3))
[(1, 2, 3), (4, 5, 6)]
omitting elements is that the first time the zip object tries to yield a None result on any of the argument iterables, it stops and does not generate a tuple. You can use itertools.zip_longest to match your expected behavior, more or less:
>>> list(zip_longest(*[iter(lst)] * 3))
[(1, 2, 3), (4, 5, 6), (7, 8, None)]
See the canonical answer List of lists changes reflected across sublists unexpectedly if the [...] * 2 aliasing behavior is surprising.
My question is similar to this previous SO question.
I have two very large lists of data (almost 20 million data points) that contain numerous consecutive duplicates. I would like to remove the consecutive duplicate as follows:
list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2] # This is 20M long!
list2 = ... # another list of size len(list1), also 20M long!
i = 0
while i < len(list)-1:
if list[i] == list[i+1]:
del list1[i]
del list2[i]
else:
i = i+1
And the output should be [1, 2, 3, 4, 5, 1, 2] for the first list.
Unfortunately, this is very slow since deleting an element in a list is a slow operation by itself. Is there any way I can speed up this process? Please note that, as shown in the above code snipped, I also need to keep track of the index i so that I can remove the corresponding element in list2.
Python has this groupby in the libraries for you:
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> from itertools import groupby
>>> [k for k,_ in groupby(list1)]
[1, 2, 3, 4, 5, 1, 2]
You can tweak it using the keyfunc argument, to also process the second list at the same time.
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> list2 = [9,9,9,8,8,8,7,7,7,6,6,6,5]
>>> from operator import itemgetter
>>> keyfunc = itemgetter(0)
>>> [next(g) for k,g in groupby(zip(list1, list2), keyfunc)]
[(1, 9), (2, 7), (3, 7), (4, 7), (5, 6), (1, 6), (2, 5)]
If you want to split those pairs back into separate sequences again:
>>> zip(*_) # "unzip" them
[(1, 2, 3, 4, 5, 1, 2), (9, 7, 7, 7, 6, 6, 5)]
You can use collections.deque and its max len argument to set a window size of 2. Then just compare the duplicity of the 2 entries in the window, and append to the results if different.
def remove_adj_dups(x):
"""
:parameter x is something like '1, 1, 2, 3, 3'
from an iterable such as a string or list or a generator
:return 1,2,3, as list
"""
result = []
from collections import deque
d = deque([object()], maxlen=2) # 1st entry is object() which only matches with itself. Kudos to Trey Hunner -->object()
for i in x:
d.append(i)
a, b = d
if a != b:
result.append(b)
return result
I generated a random list with duplicates of 20 million numbers between 0 and 10.
def random_nums_with_dups(number_range=None, range_len=None):
"""
:parameter
:param number_range: use the numbers between 0 and number_range. The smaller this is then the more dups
:param range_len: max len of the results list used in the generator
:return: a generator
Note: If number_range = 2, then random binary is returned
"""
import random
return (random.choice(range(number_range)) for i in range(range_len))
I then tested with
range_len = 2000000
def mytest():
for i in [1]:
return [remove_adj_dups(random_nums_with_dups(number_range=10, range_len=range_len))]
big_result = mytest()
big_result = mytest()[0]
print(len(big_result))
The len was 1800197 (read dups removed), in <5 secs, which includes the random list generator spinning up.
I lack the experience/knowhow to say if it is memory efficient as well. Could someone comment please
I asked some similar questions [1, 2] yesterday and got great answers, but I am not yet technically skilled enough to write a generator of such sophistication myself.
How could I write a generator that would raise StopIteration if it's the last item, instead of yielding it?
I am thinking I should somehow ask two values at a time, and see if the 2nd value is StopIteration. If it is, then instead of yielding the first value, I should raise this StopIteration. But somehow I should also remember the 2nd value that I asked if it wasn't StopIteration.
I don't know how to write it myself. Please help.
For example, if the iterable is [1, 2, 3], then the generator should return 1 and 2.
Thanks, Boda Cydo.
[1] How do I modify a generator in Python?
[2] How to determine if the value is ONE-BUT-LAST in a Python generator?
This should do the trick:
def allbutlast(iterable):
it = iter(iterable)
current = it.next()
for i in it:
yield current
current = i
>>> list(allbutlast([1,2,3]))
[1, 2]
This will iterate through the entire list, and return the previous item so the last item is never returned.
Note that calling the above on both [] and [1] will return an empty list.
First off, is a generator really needed? This sounds like the perfect job for Python’s slices syntax:
result = my_range[ : -1]
I.e.: take a range form the first item to the one before the last.
the itertools module shows a pairwise() method in its recipes. adapting from this recipe, you can get your generator:
from itertools import *
def n_apart(iterable, n):
a,b = tee(iterable)
for count in range(n):
next(b)
return zip(a,b)
def all_but_n_last(iterable, n):
return (value for value,dummy in n_apart(iterable, n))
the n_apart() function return pairs of values which are n elements apart in the input iterable, ignoring all pairs . all_but_b_last() returns the first value of all pairs, which incidentally ignores the n last elements of the list.
>>> data = range(10)
>>> list(data)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(n_apart(data,3))
[(0, 3), (1, 4), (2, 5), (3, 6), (4, 7), (5, 8), (6, 9)]
>>> list(all_but_n_last(data,3))
[0, 1, 2, 3, 4, 5, 6]
>>>
>>> list(all_but_n_last(data,1))
[0, 1, 2, 3, 4, 5, 6, 7, 8]
The more_itertools project has a tool that emulates itertools.islice with support for negative indices:
import more_itertools as mit
list(mit.islice_extended([1, 2, 3], None, -1))
# [1, 2]
gen = (x for x in iterable[:-1])
I want an algorithm to iterate over list slices. Slices size is set outside the function and can differ.
In my mind it is something like:
for list_of_x_items in fatherList:
foo(list_of_x_items)
Is there a way to properly define list_of_x_items or some other way of doing this using python 2.5?
edit1: Clarification Both "partitioning" and "sliding window" terms sound applicable to my task, but I am no expert. So I will explain the problem a bit deeper and add to the question:
The fatherList is a multilevel numpy.array I am getting from a file. Function has to find averages of series (user provides the length of series) For averaging I am using the mean() function. Now for question expansion:
edit2: How to modify the function you have provided to store the extra items and use them when the next fatherList is fed to the function?
for example if the list is lenght 10 and size of a chunk is 3, then the 10th member of the list is stored and appended to the beginning of the next list.
Related:
What is the most “pythonic” way to iterate over a list in chunks?
If you want to divide a list into slices you can use this trick:
list_of_slices = zip(*(iter(the_list),) * slice_size)
For example
>>> zip(*(iter(range(10)),) * 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
If the number of items is not dividable by the slice size and you want to pad the list with None you can do this:
>>> map(None, *(iter(range(10)),) * 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
It is a dirty little trick
OK, I'll explain how it works. It'll be tricky to explain but I'll try my best.
First a little background:
In Python you can multiply a list by a number like this:
[1, 2, 3] * 3 -> [1, 2, 3, 1, 2, 3, 1, 2, 3]
([1, 2, 3],) * 3 -> ([1, 2, 3], [1, 2, 3], [1, 2, 3])
And an iterator object can be consumed once like this:
>>> l=iter([1, 2, 3])
>>> l.next()
1
>>> l.next()
2
>>> l.next()
3
The zip function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. For example:
zip([1, 2, 3], [20, 30, 40]) -> [(1, 20), (2, 30), (3, 40)]
zip(*[(1, 20), (2, 30), (3, 40)]) -> [[1, 2, 3], [20, 30, 40]]
The * in front of zip used to unpack arguments. You can find more details here.
So
zip(*[(1, 20), (2, 30), (3, 40)])
is actually equivalent to
zip((1, 20), (2, 30), (3, 40))
but works with a variable number of arguments
Now back to the trick:
list_of_slices = zip(*(iter(the_list),) * slice_size)
iter(the_list) -> convert the list into an iterator
(iter(the_list),) * N -> will generate an N reference to the_list iterator.
zip(*(iter(the_list),) * N) -> will feed those list of iterators into zip. Which in turn will group them into N sized tuples. But since all N items are in fact references to the same iterator iter(the_list) the result will be repeated calls to next() on the original iterator
I hope that explains it. I advice you to go with an easier to understand solution. I was only tempted to mention this trick because I like it.
If you want to be able to consume any iterable you can use these functions:
from itertools import chain, islice
def ichunked(seq, chunksize):
"""Yields items from an iterator in iterable chunks."""
it = iter(seq)
while True:
yield chain([it.next()], islice(it, chunksize-1))
def chunked(seq, chunksize):
"""Yields items from an iterator in list chunks."""
for chunk in ichunked(seq, chunksize):
yield list(chunk)
Use a generator:
big_list = [1,2,3,4,5,6,7,8,9]
slice_length = 3
def sliceIterator(lst, sliceLen):
for i in range(len(lst) - sliceLen + 1):
yield lst[i:i + sliceLen]
for slice in sliceIterator(big_list, slice_length):
foo(slice)
sliceIterator implements a "sliding window" of width sliceLen over the squence lst, i.e. it produces overlapping slices: [1,2,3], [2,3,4], [3,4,5], ... Not sure if that is the OP's intention, though.
Do you mean something like:
def callonslices(size, fatherList, foo):
for i in xrange(0, len(fatherList), size):
foo(fatherList[i:i+size])
If this is roughly the functionality you want you might, if you desire, dress it up a bit in a generator:
def sliceup(size, fatherList):
for i in xrange(0, len(fatherList), size):
yield fatherList[i:i+size]
and then:
def callonslices(size, fatherList, foo):
for sli in sliceup(size, fatherList):
foo(sli)
Answer to the last part of the question:
question update: How to modify the
function you have provided to store
the extra items and use them when the
next fatherList is fed to the
function?
If you need to store state then you can use an object for that.
class Chunker(object):
"""Split `iterable` on evenly sized chunks.
Leftovers are remembered and yielded at the next call.
"""
def __init__(self, chunksize):
assert chunksize > 0
self.chunksize = chunksize
self.chunk = []
def __call__(self, iterable):
"""Yield items from `iterable` `self.chunksize` at the time."""
assert len(self.chunk) < self.chunksize
for item in iterable:
self.chunk.append(item)
if len(self.chunk) == self.chunksize:
# yield collected full chunk
yield self.chunk
self.chunk = []
Example:
chunker = Chunker(3)
for s in "abcd", "efgh":
for chunk in chunker(s):
print ''.join(chunk)
if chunker.chunk: # is there anything left?
print ''.join(chunker.chunk)
Output:
abc
def
gh
I am not sure, but it seems you want to do what is called a moving average. numpy provides facilities for this (the convolve function).
>>> x = numpy.array(range(20))
>>> x
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19])
>>> n = 2 # moving average window
>>> numpy.convolve(numpy.ones(n)/n, x)[n-1:-n+1]
array([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5,
9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5])
The nice thing is that it accomodates different weighting schemes nicely (just change numpy.ones(n) / n to something else).
You can find a complete material here:
http://www.scipy.org/Cookbook/SignalSmooth
Expanding on the answer of #Ants Aasma: In Python 3.7 the handling of the StopIteration exception changed (according to PEP-479). A compatible version would be:
from itertools import chain, islice
def ichunked(seq, chunksize):
it = iter(seq)
while True:
try:
yield chain([next(it)], islice(it, chunksize - 1))
except StopIteration:
return
Your question could use some more detail, but how about:
def iterate_over_slices(the_list, slice_size):
for start in range(0, len(the_list)-slice_size):
slice = the_list[start:start+slice_size]
foo(slice)
For a near-one liner (after itertools import) in the vein of Nadia's answer dealing with non-chunk divisible sizes without padding:
>>> import itertools as itt
>>> chunksize = 5
>>> myseq = range(18)
>>> cnt = itt.count()
>>> print [ tuple(grp) for k,grp in itt.groupby(myseq, key=lambda x: cnt.next()//chunksize%2)]
[(0, 1, 2, 3, 4), (5, 6, 7, 8, 9), (10, 11, 12, 13, 14), (15, 16, 17)]
If you want, you can get rid of the itertools.count() requirement using enumerate(), with a rather uglier:
[ [e[1] for e in grp] for k,grp in itt.groupby(enumerate(myseq), key=lambda x: x[0]//chunksize%2) ]
(In this example the enumerate() would be superfluous, but not all sequences are neat ranges like this, obviously)
Nowhere near as neat as some other answers, but useful in a pinch, especially if already importing itertools.
A function that slices a list or an iterator into chunks of a given size. Also handles the case correctly if the last chunk is smaller:
def slice_iterator(data, slice_len):
it = iter(data)
while True:
items = []
for index in range(slice_len):
try:
item = next(it)
except StopIteration:
if items == []:
return # we are done
else:
break # exits the "for" loop
items.append(item)
yield items
Usage example:
for slice in slice_iterator([1,2,3,4,5,6,7,8,9,10],3):
print(slice)
Result:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]