Iteration over list slices - python

I want an algorithm to iterate over list slices. Slices size is set outside the function and can differ.
In my mind it is something like:
for list_of_x_items in fatherList:
foo(list_of_x_items)
Is there a way to properly define list_of_x_items or some other way of doing this using python 2.5?
edit1: Clarification Both "partitioning" and "sliding window" terms sound applicable to my task, but I am no expert. So I will explain the problem a bit deeper and add to the question:
The fatherList is a multilevel numpy.array I am getting from a file. Function has to find averages of series (user provides the length of series) For averaging I am using the mean() function. Now for question expansion:
edit2: How to modify the function you have provided to store the extra items and use them when the next fatherList is fed to the function?
for example if the list is lenght 10 and size of a chunk is 3, then the 10th member of the list is stored and appended to the beginning of the next list.
Related:
What is the most “pythonic” way to iterate over a list in chunks?

If you want to divide a list into slices you can use this trick:
list_of_slices = zip(*(iter(the_list),) * slice_size)
For example
>>> zip(*(iter(range(10)),) * 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
If the number of items is not dividable by the slice size and you want to pad the list with None you can do this:
>>> map(None, *(iter(range(10)),) * 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
It is a dirty little trick
OK, I'll explain how it works. It'll be tricky to explain but I'll try my best.
First a little background:
In Python you can multiply a list by a number like this:
[1, 2, 3] * 3 -> [1, 2, 3, 1, 2, 3, 1, 2, 3]
([1, 2, 3],) * 3 -> ([1, 2, 3], [1, 2, 3], [1, 2, 3])
And an iterator object can be consumed once like this:
>>> l=iter([1, 2, 3])
>>> l.next()
1
>>> l.next()
2
>>> l.next()
3
The zip function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. For example:
zip([1, 2, 3], [20, 30, 40]) -> [(1, 20), (2, 30), (3, 40)]
zip(*[(1, 20), (2, 30), (3, 40)]) -> [[1, 2, 3], [20, 30, 40]]
The * in front of zip used to unpack arguments. You can find more details here.
So
zip(*[(1, 20), (2, 30), (3, 40)])
is actually equivalent to
zip((1, 20), (2, 30), (3, 40))
but works with a variable number of arguments
Now back to the trick:
list_of_slices = zip(*(iter(the_list),) * slice_size)
iter(the_list) -> convert the list into an iterator
(iter(the_list),) * N -> will generate an N reference to the_list iterator.
zip(*(iter(the_list),) * N) -> will feed those list of iterators into zip. Which in turn will group them into N sized tuples. But since all N items are in fact references to the same iterator iter(the_list) the result will be repeated calls to next() on the original iterator
I hope that explains it. I advice you to go with an easier to understand solution. I was only tempted to mention this trick because I like it.

If you want to be able to consume any iterable you can use these functions:
from itertools import chain, islice
def ichunked(seq, chunksize):
"""Yields items from an iterator in iterable chunks."""
it = iter(seq)
while True:
yield chain([it.next()], islice(it, chunksize-1))
def chunked(seq, chunksize):
"""Yields items from an iterator in list chunks."""
for chunk in ichunked(seq, chunksize):
yield list(chunk)

Use a generator:
big_list = [1,2,3,4,5,6,7,8,9]
slice_length = 3
def sliceIterator(lst, sliceLen):
for i in range(len(lst) - sliceLen + 1):
yield lst[i:i + sliceLen]
for slice in sliceIterator(big_list, slice_length):
foo(slice)
sliceIterator implements a "sliding window" of width sliceLen over the squence lst, i.e. it produces overlapping slices: [1,2,3], [2,3,4], [3,4,5], ... Not sure if that is the OP's intention, though.

Do you mean something like:
def callonslices(size, fatherList, foo):
for i in xrange(0, len(fatherList), size):
foo(fatherList[i:i+size])
If this is roughly the functionality you want you might, if you desire, dress it up a bit in a generator:
def sliceup(size, fatherList):
for i in xrange(0, len(fatherList), size):
yield fatherList[i:i+size]
and then:
def callonslices(size, fatherList, foo):
for sli in sliceup(size, fatherList):
foo(sli)

Answer to the last part of the question:
question update: How to modify the
function you have provided to store
the extra items and use them when the
next fatherList is fed to the
function?
If you need to store state then you can use an object for that.
class Chunker(object):
"""Split `iterable` on evenly sized chunks.
Leftovers are remembered and yielded at the next call.
"""
def __init__(self, chunksize):
assert chunksize > 0
self.chunksize = chunksize
self.chunk = []
def __call__(self, iterable):
"""Yield items from `iterable` `self.chunksize` at the time."""
assert len(self.chunk) < self.chunksize
for item in iterable:
self.chunk.append(item)
if len(self.chunk) == self.chunksize:
# yield collected full chunk
yield self.chunk
self.chunk = []
Example:
chunker = Chunker(3)
for s in "abcd", "efgh":
for chunk in chunker(s):
print ''.join(chunk)
if chunker.chunk: # is there anything left?
print ''.join(chunker.chunk)
Output:
abc
def
gh

I am not sure, but it seems you want to do what is called a moving average. numpy provides facilities for this (the convolve function).
>>> x = numpy.array(range(20))
>>> x
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19])
>>> n = 2 # moving average window
>>> numpy.convolve(numpy.ones(n)/n, x)[n-1:-n+1]
array([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5,
9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5])
The nice thing is that it accomodates different weighting schemes nicely (just change numpy.ones(n) / n to something else).
You can find a complete material here:
http://www.scipy.org/Cookbook/SignalSmooth

Expanding on the answer of #Ants Aasma: In Python 3.7 the handling of the StopIteration exception changed (according to PEP-479). A compatible version would be:
from itertools import chain, islice
def ichunked(seq, chunksize):
it = iter(seq)
while True:
try:
yield chain([next(it)], islice(it, chunksize - 1))
except StopIteration:
return

Your question could use some more detail, but how about:
def iterate_over_slices(the_list, slice_size):
for start in range(0, len(the_list)-slice_size):
slice = the_list[start:start+slice_size]
foo(slice)

For a near-one liner (after itertools import) in the vein of Nadia's answer dealing with non-chunk divisible sizes without padding:
>>> import itertools as itt
>>> chunksize = 5
>>> myseq = range(18)
>>> cnt = itt.count()
>>> print [ tuple(grp) for k,grp in itt.groupby(myseq, key=lambda x: cnt.next()//chunksize%2)]
[(0, 1, 2, 3, 4), (5, 6, 7, 8, 9), (10, 11, 12, 13, 14), (15, 16, 17)]
If you want, you can get rid of the itertools.count() requirement using enumerate(), with a rather uglier:
[ [e[1] for e in grp] for k,grp in itt.groupby(enumerate(myseq), key=lambda x: x[0]//chunksize%2) ]
(In this example the enumerate() would be superfluous, but not all sequences are neat ranges like this, obviously)
Nowhere near as neat as some other answers, but useful in a pinch, especially if already importing itertools.

A function that slices a list or an iterator into chunks of a given size. Also handles the case correctly if the last chunk is smaller:
def slice_iterator(data, slice_len):
it = iter(data)
while True:
items = []
for index in range(slice_len):
try:
item = next(it)
except StopIteration:
if items == []:
return # we are done
else:
break # exits the "for" loop
items.append(item)
yield items
Usage example:
for slice in slice_iterator([1,2,3,4,5,6,7,8,9,10],3):
print(slice)
Result:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]

Related

Fast removal of consecutive duplicates in a list and corresponding items from another list

My question is similar to this previous SO question.
I have two very large lists of data (almost 20 million data points) that contain numerous consecutive duplicates. I would like to remove the consecutive duplicate as follows:
list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2] # This is 20M long!
list2 = ... # another list of size len(list1), also 20M long!
i = 0
while i < len(list)-1:
if list[i] == list[i+1]:
del list1[i]
del list2[i]
else:
i = i+1
And the output should be [1, 2, 3, 4, 5, 1, 2] for the first list.
Unfortunately, this is very slow since deleting an element in a list is a slow operation by itself. Is there any way I can speed up this process? Please note that, as shown in the above code snipped, I also need to keep track of the index i so that I can remove the corresponding element in list2.
Python has this groupby in the libraries for you:
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> from itertools import groupby
>>> [k for k,_ in groupby(list1)]
[1, 2, 3, 4, 5, 1, 2]
You can tweak it using the keyfunc argument, to also process the second list at the same time.
>>> list1 = [1,1,1,1,1,1,2,3,4,4,5,1,2]
>>> list2 = [9,9,9,8,8,8,7,7,7,6,6,6,5]
>>> from operator import itemgetter
>>> keyfunc = itemgetter(0)
>>> [next(g) for k,g in groupby(zip(list1, list2), keyfunc)]
[(1, 9), (2, 7), (3, 7), (4, 7), (5, 6), (1, 6), (2, 5)]
If you want to split those pairs back into separate sequences again:
>>> zip(*_) # "unzip" them
[(1, 2, 3, 4, 5, 1, 2), (9, 7, 7, 7, 6, 6, 5)]
You can use collections.deque and its max len argument to set a window size of 2. Then just compare the duplicity of the 2 entries in the window, and append to the results if different.
def remove_adj_dups(x):
"""
:parameter x is something like '1, 1, 2, 3, 3'
from an iterable such as a string or list or a generator
:return 1,2,3, as list
"""
result = []
from collections import deque
d = deque([object()], maxlen=2) # 1st entry is object() which only matches with itself. Kudos to Trey Hunner -->object()
for i in x:
d.append(i)
a, b = d
if a != b:
result.append(b)
return result
I generated a random list with duplicates of 20 million numbers between 0 and 10.
def random_nums_with_dups(number_range=None, range_len=None):
"""
:parameter
:param number_range: use the numbers between 0 and number_range. The smaller this is then the more dups
:param range_len: max len of the results list used in the generator
:return: a generator
Note: If number_range = 2, then random binary is returned
"""
import random
return (random.choice(range(number_range)) for i in range(range_len))
I then tested with
range_len = 2000000
def mytest():
for i in [1]:
return [remove_adj_dups(random_nums_with_dups(number_range=10, range_len=range_len))]
big_result = mytest()
big_result = mytest()[0]
print(len(big_result))
The len was 1800197 (read dups removed), in <5 secs, which includes the random list generator spinning up.
I lack the experience/knowhow to say if it is memory efficient as well. Could someone comment please

Accessing consecutive items when using a generator

Lets say I have a tuple generator, which I simulate as follows:
g = (x for x in (1,2,3,97,98,99))
For this specific generator, I wish to write a function to output the following:
(1,2,3)
(2,3,97)
(3,97,98)
(97,98,99)
(98,99)
(99)
So I'm iterating over three consecutive items at a time and printing them, except when I approach the end.
Should the first line in my function be:
t = tuple(g)
In other words, is it best to work on a tuple directly or might it be beneficial to work with a generator. If it is possible to approach this problem using both methods, please state the benefits and disadvantages for both approaches. Also, if it might be wise to use the generator approach, how might such a solution look?
Here's what I currently do:
def f(data, l):
t = tuple(data)
for j in range(len(t)):
print(t[j:j+l])
data = (x for x in (1,2,3,4,5))
f(data,3)
UPDATE:
Note that I've updated my function to take a second argument specifying the length of the window.
A specific example for returning three items could read
def yield3(gen):
b, c = gen.next(), gen.next()
try:
while True:
a, b, c = b, c, gen.next()
yield (a, b, c)
except StopIteration:
yield (b, c)
yield (c,)
g = (x for x in (1,2,3,97,98,99))
for l in yield3(g):
print l
Actually there're functions for this in itertools module - tee() and izip_longest():
>>> from itertools import izip_longest, tee
>>> g = (x for x in (1,2,3,97,98,99))
>>> a, b, c = tee(g, 3)
>>> next(b, None)
>>> next(c, None)
>>> next(c, None)
>>> [[x for x in l if x is not None] for l in izip_longest(a, b, c)]
[(1, 2, 3), (2, 3, 97), (3, 97, 98), (97, 98, 99), (98, 99), (99)]
from documentation:
Return n independent iterators from a single iterable. Equivalent to:
def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)
If you might need to take more than three elements at a time, and you don't want to load the whole generator into memory, I suggest using a deque from the collections module in the standard library to store the current set of items. A deque (pronounced "deck" and meaning "double-ended queue") can have values pushed and popped efficiently from both ends.
from collections import deque
from itertools import islice
def get_tuples(gen, n):
q = deque(islice(gen, n)) # pre-load the queue with `n` values
while q: # run until the queue is empty
yield tuple(q) # yield a tuple copied from the current queue
q.popleft() # remove the oldest value from the queue
try:
q.append(next(gen)) # try to add a new value from the generator
except StopIteration:
pass # but we don't care if there are none left
actually it depends.
A generator might be useful in case of very large collections, where you dont really need to store them all in memory to achieve the result you want.
On the other hand, you have to print it is seems safe to guess that the collection isn't huge, so it doesn make a difference.
However, this is a generator that achieve what you were looking for
def part(gen, size):
t = tuple()
try:
while True:
l = gen.next()
if len(t) < size:
t = t + (l,)
if len(t) == size:
yield t
continue
if len(t) == size:
t = t[1:] + (l,)
yield t
continue
except StopIteration:
while len(t) > 1:
t = t[1:]
yield t
>>> a = (x for x in range(10))
>>> list(part(a, 3))
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9), (9,)]
>>> a = (x for x in range(10))
>>> list(part(a, 5))
[(0, 1, 2, 3, 4), (1, 2, 3, 4, 5), (2, 3, 4, 5, 6), (3, 4, 5, 6, 7), (4, 5, 6, 7, 8), (5, 6, 7, 8, 9), (6, 7, 8, 9), (7, 8, 9), (8, 9), (9,)]
>>>
note: the code actually isn't very elegant but it works also when you have to split in, say, 5 pieces
It's definitely best to work with the generator because you don't want to have to hold everything in memory.
It can be done very simply with a deque.
from collections import deque
from itertools import islice
def overlapping_chunks(size, iterable, *, head=False, tail=False):
"""
Get overlapping subsections of an iterable of a specified size.
print(*overlapping_chunks(3, (1,2,3,97,98,99)))
#>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]
If head is given, the "warm up" before the specified maximum
number of items is included.
print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
#>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]
If head is truthy, the "warm up" before the specified maximum
number of items is included.
print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
#>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]
If tail is truthy, the "cool down" after the iterable is exhausted
is included.
print(*overlapping_chunks(3, (1,2,3,97,98,99), tail=True))
#>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99] [98, 99] [99]
"""
chunker = deque(maxlen=size)
iterator = iter(iterable)
for item in islice(iterator, size-1):
chunker.append(item)
if head:
yield list(chunker)
for item in iterator:
chunker.append(item)
yield list(chunker)
if tail:
while len(chunker) > 1:
chunker.popleft()
yield list(chunker)
I think what you currently do seems a lot easier than any of the above. If there isn't any particular need to make it more complicated, my opinion would be to keep it simple. In other words, it is best to work on a tuple directly.
Here's a generator that works in both Python 2.7.17 and 3.8.1. Internally it uses iterators and generators whenever possible, so it should be relatively memory efficient.
try:
from itertools import izip, izip_longest, takewhile
except ImportError: # Python 3
izip = zip
from itertools import zip_longest as izip_longest, takewhile
def tuple_window(n, iterable):
iterators = [iter(iterable) for _ in range(n)]
for n, iterator in enumerate(iterators):
for _ in range(n):
next(iterator)
_NULL = object() # Unique singleton object.
for t in izip_longest(*iterators, fillvalue=_NULL):
yield tuple(takewhile(lambda v: v is not _NULL, t))
if __name__ == '__main__':
data = (1, 2, 3, 97, 98, 99)
for t in tuple_window(3, data):
print(t)
Output:
(1, 2, 3)
(2, 3, 97)
(3, 97, 98)
(97, 98, 99)
(98, 99)
(99,)

Traversing a sequence of generators

I have a sequence of generators: (gen_0, gen_1, ... gen_n)
These generators will create their values lazily but are finite and will have potentially different lengths.
I need to be able to construct another generator that yields the first element of each generator in order, followed by the second and so forth, skipping values from generators that have been exhausted.
I think this problem is analogous to taking the tuple
((1, 4, 7, 10, 13, 16), (2, 5, 8, 11, 14), (3, 6, 9, 12, 15, 17, 18))
and traversing it so that it would yield the numbers from 1 through 18 in order.
I'm working on solving this simple example using (genA, genB, genC) with genA yielding values from (1, 4, 7, 10, 13, 16), genB yielding (2, 5, 8, 11, 14) and genC yielding (3, 6, 9, 12, 15, 17, 18).
To solve the simpler problem with the tuple of tuples the answer is fairly simple if the
elements of the tuple were the same length. If the variable 'a' referred to the tuple, you could use
[i for t in zip(*a) for i in t]
Unfortunately the items are not necessarily the same length and the zip trick doesn't seem to work for generators anyway.
So far my code is horribly ugly and I'm failing to find anything approaching a clean solution. Help?
I think you need itertools.izip_longest
>>> list([e for e in t if e is not None] for t in itertools.izip_longest(*some_gen,
fillvalue=None))
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17], [18]]
>>>
If you look at the documentation for itertools.izip_longest, you'll see that it gives a pure-Python implementation. It's easy to modify this implementation so that it produces the results you need instead (that is, just like izip_longest, but without any fillvalue):
class ZipExhausted(Exception):
pass
def izip_longest_nofill(*args):
"""
Return a generator whose .next() method returns a tuple where the
i-th element comes from the i-th iterable argument that has not
yet been exhausted. The .next() method continues until all
iterables in the argument sequence have been exhausted and then it
raises StopIteration.
>>> list(izip_longest_nofill(*[xrange(i,2*i) for i in 2,3,5]))
[(2, 3, 5), (3, 4, 6), (5, 7), (8,), (9,)]
"""
iterators = map(iter, args)
def zip_next():
i = 0
while i < len(iterators):
try:
yield next(iterators[i])
i += 1
except StopIteration:
del iterators[i]
if i == 0:
raise ZipExhausted
try:
while iterators:
yield tuple(zip_next())
except ZipExhausted:
pass
This avoids the need to re-filter the output of izip_longest to discard the fillvalues. Alternatively, if you want a "flattened" output:
def iter_round_robin(*args):
"""
Return a generator whose .next() method cycles round the iterable
arguments in turn (ignoring ones that have been exhausted). The
.next() method continues until all iterables in the argument
sequence have been exhausted and then it raises StopIteration.
>>> list(iter_round_robin(*[xrange(i) for i in 2,3,5]))
[0, 0, 0, 1, 1, 1, 2, 2, 3, 4]
"""
iterators = map(iter, args)
while iterators:
i = 0
while i < len(iterators):
try:
yield next(iterators[i])
i += 1
except StopIteration:
del iterators[i]
Another itertools option if you want them all collapsed in a single list; this (as #gg.kaspersky already pointed out in another thread) does not handle generated None values.
g = (generator1, generator2, generator3)
res = [e for e in itertools.chain(*itertools.izip_longest(*g)) if e is not None]
print res
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
You might consider itertools.izip_longest, but in case None is a valid value, that solution will fail. Here is a sample "another generator", which does exactly what you asked for, and is pretty clean:
def my_gen(generators):
while True:
rez = ()
for gen in generators:
try:
rez = rez + (gen.next(),)
except StopIteration:
pass
if rez:
yield rez
else:
break
print [x for x in my_gen((iter(xrange(2)), iter(xrange(3)), iter(xrange(1))))]
[(0, 0, 0), (1, 1), (2,)] #output

python: how to slice/store iter pointed data in a fixed buffer class?

All,
As you know, by python iter we can use iter.next() to get the next item of data.
take a list for example:
l = [x for x in range(100)]
itl = iter(l)
itl.next() # 0
itl.next() # 1
Now I want a buffer can store *general iter pointed data * slice in fixed size, use above list iter to demo my question.
class IterPage(iter, size):
# class code here
itp = IterPage(itl, 5)
what I want is
print itp.first() # [0,1,2,3,4]
print itp.next() # [5,6,7,8,9]
print itp.prev() # [0,1,2,3,4]
len(itp) # 20 # 100 item / 5 fixed size = 20
print itp.last() # [96,97,98,99,100]
for y in itp: # iter may not support "for" and len(iter) then something alike code also needed here
print y
[0,1,2,3,4]
[5,6,7,8,9]
...
[96,97,98,99,100]
it is not a homework, but as a beginner of the python know little about to design an iter class, could someone share me how to code the class "IterPage" here?
Also, by below answers I found if the raw data what I want to slice is very big, for example a 8Giga text file or a 10^100 records table on a database, it may not able to read all of them into a list - I have no so much physical memories. Take the snippet in python document for example:
http://docs.python.org/library/sqlite3.html#
>>> c = conn.cursor()
>>> c.execute('select * from stocks order by price')
>>> for row in c:
... print row
...
(u'2006-01-05', u'BUY', u'RHAT', 100, 35.14)
(u'2006-03-28', u'BUY', u'IBM', 1000, 45.0)
(u'2006-04-06', u'SELL', u'IBM', 500, 53.0)
(u'2006-04-05', u'BUY', u'MSOFT', 1000, 72.0)
If here we've got about 10^100 records, In that case, it it possible only store line/records I want by this class with itp = IterPage(c, 5)? if I invoke the itp.next() the itp can just fetch next 5 records from database?
Thanks!
PS: I got an approach in below link:
http://code.activestate.com/recipes/577196-windowing-an-iterable-with-itertools/
and I also found someone want to make a itertools.iwindow() function however it is just been rejected.
http://mail.python.org/pipermail/python-dev/2006-May/065304.html
Since you asked about design, I'll write a bit about what you want - it's not a iterator.
The defining property of a iterator is that it only supports iteration, not random access. But methods like .first and .last do random access, so what you ask for is not a iterator.
There are of course containers that allow this. They are called sequences and the simplest of them is the list. It's .first method is written as [0] and it's .last is [-1].
So here is such a object that slices a given sequence. It stores a list of slice objects, which is what Python uses to slice out parts of a list. The methods that a class must implement to be a sequence are given by the abstact base class Sequence. It's nice to inherit from it because it throws errors if you forget to implement a required method.
from collections import Sequence
class SlicedList(Sequence):
def __init__(self, iterable, size):
self.seq = list(iterable)
self.slices = [slice(i,i+size) for i in range(0,len(self.seq), size)]
def __contains__(self, item):
# checks if a item is in this sequence
return item in self.seq
def __iter__(self):
""" iterates over all slices """
return (self.seq[slice] for slice in self.slices)
def __len__(self):
""" implements len( .. ) """
return len(self.slices)
def __getitem__(self, n):
# two forms of getitem ..
if isinstance(n, slice):
# implements sliced[a:b]
return [self.seq[x] for x in self.slices[n]]
else:
# implements sliced[a]
return self.seq[self.slices[n]]
s = SlicedList(range(100), 5)
# length
print len(s) # 20
#iteration
print list(s) # [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9], ... , [95, 96, 97, 98, 99]]
# explicit iteration:
it = iter(s)
print next(it) # [0, 1, 2, 3, 4]
# we can slice it too
print s[0], s[-1] # [0, 1, 2, 3, 4] [95, 96, 97, 98, 99]
# get the first two
print s[0:2] # [[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]
# every other item
print s[::2] # [[0, 1, 2, 3, 4], [10, 11, 12, 13, 14], [20, 21, 22, 23, 24], ... ]
Now if you really want methods like .start (what for anyways, just a verbose way for [0] ) you can write a class like this:
class Navigator(object):
def __init__(self, seq):
self.c = 0
self.seq = seq
def next(self):
self.c +=1
return self.seq[self.c]
def prev(self):
self.c -=1
return self.seq[self.c]
def start(self):
self.c = 0
return self.seq[self.c]
def end(self):
self.c = len(self.seq)-1
return self.seq[self.c]
n = Navigator(SlicedList(range(100), 5))
print n.start(), n.next(), n.prev(), n.end()
The raw data that I want to slice is
very big, for example a 8Giga text
file... I may not be able to read all of
them into a list - I do not have so much
physical memory. In that case, is it
possible only get line/records I want
by this class?
No, as it stands, the class originally proposed below converts the iterator into a
list, which make it 100% useless for your situation.
Just use the grouper idiom (also mentioned below).
You'll have to be smart about remembering previous groups.
To save on memory, only store those previous groups that you need.
For example, if you only need the most recent previous group, you could store that in
a single variable, previous_group.
If you need the 5 most recent previous groups, you could use a collections.deque with a maximum size of 5.
Or, you could use the window idiom to get a sliding window of n groups of groups...
Given what you've told us so far, I would not define a class for this, because I don't see many reusable elements to the solution.
Mainly, what you want can be done with the grouper idiom:
In [22]: l = xrange(100)
In [23]: itl=iter(l)
In [24]: import itertools
In [25]: for y in itertools.izip(*[itl]*5):
....: print(y)
(0, 1, 2, 3, 4)
(5, 6, 7, 8, 9)
(10, 11, 12, 13, 14)
...
(95, 96, 97, 98, 99)
Calling next is no problem:
In [28]: l = xrange(100)
In [29]: itl=itertools.izip(*[iter(l)]*5)
In [30]: next(itl)
Out[30]: (0, 1, 2, 3, 4)
In [31]: next(itl)
Out[31]: (5, 6, 7, 8, 9)
But making a previous method is a big problem, because iterators don't work this way. Iterators are meant to produce values without remembering past values.
If you need all past values, then you need a list, not an iterator:
In [32]: l = xrange(100)
In [33]: ll=list(itertools.izip(*[iter(l)]*5))
In [34]: ll[0]
Out[34]: (0, 1, 2, 3, 4)
In [35]: ll[1]
Out[35]: (5, 6, 7, 8, 9)
# Get the last group
In [36]: ll[-1]
Out[36]: (95, 96, 97, 98, 99)
Now getting the previous group is just a matter of keeping track of the list index.

How to write a generator that returns ALL-BUT-LAST items in the iterable in Python?

I asked some similar questions [1, 2] yesterday and got great answers, but I am not yet technically skilled enough to write a generator of such sophistication myself.
How could I write a generator that would raise StopIteration if it's the last item, instead of yielding it?
I am thinking I should somehow ask two values at a time, and see if the 2nd value is StopIteration. If it is, then instead of yielding the first value, I should raise this StopIteration. But somehow I should also remember the 2nd value that I asked if it wasn't StopIteration.
I don't know how to write it myself. Please help.
For example, if the iterable is [1, 2, 3], then the generator should return 1 and 2.
Thanks, Boda Cydo.
[1] How do I modify a generator in Python?
[2] How to determine if the value is ONE-BUT-LAST in a Python generator?
This should do the trick:
def allbutlast(iterable):
it = iter(iterable)
current = it.next()
for i in it:
yield current
current = i
>>> list(allbutlast([1,2,3]))
[1, 2]
This will iterate through the entire list, and return the previous item so the last item is never returned.
Note that calling the above on both [] and [1] will return an empty list.
First off, is a generator really needed? This sounds like the perfect job for Python’s slices syntax:
result = my_range[ : -1]
I.e.: take a range form the first item to the one before the last.
the itertools module shows a pairwise() method in its recipes. adapting from this recipe, you can get your generator:
from itertools import *
def n_apart(iterable, n):
a,b = tee(iterable)
for count in range(n):
next(b)
return zip(a,b)
def all_but_n_last(iterable, n):
return (value for value,dummy in n_apart(iterable, n))
the n_apart() function return pairs of values which are n elements apart in the input iterable, ignoring all pairs . all_but_b_last() returns the first value of all pairs, which incidentally ignores the n last elements of the list.
>>> data = range(10)
>>> list(data)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(n_apart(data,3))
[(0, 3), (1, 4), (2, 5), (3, 6), (4, 7), (5, 8), (6, 9)]
>>> list(all_but_n_last(data,3))
[0, 1, 2, 3, 4, 5, 6]
>>>
>>> list(all_but_n_last(data,1))
[0, 1, 2, 3, 4, 5, 6, 7, 8]
The more_itertools project has a tool that emulates itertools.islice with support for negative indices:
import more_itertools as mit
list(mit.islice_extended([1, 2, 3], None, -1))
# [1, 2]
gen = (x for x in iterable[:-1])

Categories

Resources