Looping through a list that gets updated - python

I am working with two columns. One has a few individual numbers and the other has sums. I'd like to match up list1 and list2. At each iteration, I remove the numbers that were matched up. There will not necessarily be a match for all numbers, but I'd like to get as many as possible. How do I make sure the loop continues to iterate but terminates once all matches have been made?
Ideally, in the example below, I would like to end up with something like:
[5,6], 11\n
[2,3], 5
Of course, if the whole approach is incorrect, please feel free to advise.
Thank you for your kind help.
import itertools
list1=[5,6,2,3,8,7]
list2=[11,5]
combos=list(itertools.combinations(list1, 2))
for i in range(len(list1)):
if sum(combos[i]) in list2:
list1.remove(combos[i][0])
list1.remove(combos[i][1])
list2.remove(sum(combos[i]))
combos=list(itertools.combinations(list1, 2))
print(combos[i])

This gets only one match up, but it is not the best match up :
import itertools
list1=[5,6,2,3,8,7]
list2=[11,5]
combos=list(itertools.combinations(list1, 2))
result=[]
for i in range(len(combos)):
if sum(combos[i]) in list2 and (combos[i][0] in list1 and combos[i][0] in list1 ):
result.append(combos[i])
list1.remove(combos[i][0])
list1.remove(combos[i][1])
list2.remove(sum(combos[i]))
print(result)
#Output: [(5, 6), (2, 3)]
For below inputs it only finds some match up and not the most number of the match ups (depending on order of combination check):
list1=[5,6,2,8,3,7]
list2=[11,5,10,15]
# Output: [(5, 6), (2, 8)]
list1=[5,6,2,3,8,7]
list2=[11,5,10,15]
# Output: [(5, 6), (2, 3), (8, 7)]

As far as your question is not specified enough, I assume that:
there are two list-like storages with unsorted sequence of elements (ints);
elements may repeat in both storages;
each combination of any length from first sequence can be "mached" to correspondive element in second storage via special function (sum);
as many as possible (or may be all) such matches should be found and retrieved immediately after get found (via yield or print);
all unique elements corresponding to such matches should be removed immediately after get found from both storages;
it is allowed to create no more than one copy per storage;
matching algorithm should be finite.
The proposed solution is based on these assumptions and satisfies almost all of them. However it can be significantly simplified if you will concretize conditions - what are exact storages types? It would be nice if the data type could be collections.deque or set - removing elements from list object is quit expensive pleasure. Also if your data should be updated - when should it be updated (while searching matches - or after)? Are there any duplicate elements in data? Is it allowed to create copies of input storages? If so, what is the maximum size|number of such copies? Is it allowed to do any other changes to storages while searching matches? Should the algorithm stop after any storage becomes empty?
from itertools import combinations, chain
def update_data_matches(data, matches, comb_lengths, get_match, discard_all, discard_all_from):
# itertools.combinations operates with tuple-copy of any input data,
# which type is not tuple. We need to create one copy explicitly
# to minimize number of created copies:
data_copy = tuple(data)
if isinstance(comb_lengths, int):
combs = combinations(data_copy, comb_lengths)
else:
combs = chain.from_iterable(combinations(data_copy, cl) for cl in comb_lengths)
matches_copy = frozenset(matches)
for comb in combs:
if (possible_match := get_match(comb)) in matches_copy:
discard_all(matches, possible_match)
discard_all_from(data, comb)
yield comb
def lst_discard_all(lst, element, i=0):
try:
while True:
del lst[(i := lst.index(element, i))]
except ValueError:
return
def lst_discard_all_from(lst, elements):
for i in range(len(lst)-1, -1, -1):
if lst[i] in elements:
del lst[i]
def lstset_discard_all(lst, element):
try:
lst.remove(element)
except ValueError:
return
def lstset_discard_all_from(lst, elements):
for e in elements:
try:
lst.remove(e)
except ValueError:
continue
def update_lst_data_matches(*args, **kwargs):
return update_data_matches(*args, discard_all=lst_discard_all,
discard_all_from=lst_discard_all_from, **kwargs)
def update_set_data_matches(*args, **kwargs):
return update_data_matches(*args, discard_all=set.discard,
discard_all_from=set.difference_update, **kwargs)
def update_lstset_data_matches(*args, **kwargs):
return update_data_matches(*args, discard_all=lstset_discard_all,
discard_all_from=lstset_discard_all_from, **kwargs)
In your case:
data1 = [5,6,2,3,8,7]
data2 = [11,5]
for match in update_lstset_data_matches(data1, data2, 2, sum):
print(f'{match=}', f'{sum(match)=}', f'{data1=}', f'{data2=}')
Result:
match=(5, 6) sum(match)=11 data1=[2, 3, 8, 7] data2=[5]
match=(2, 3) sum(match)=5 data1=[8, 7] data2=[]
match=(3, 8) sum(match)=11 data1=[7] data2=[]
Example 2: find all sum matches with combinations of all length. Related problem https://en.wikipedia.org/wiki/Subset_sum_problem
data3 = {-7, -3, 66, -2, 5, 8}
data4 = {0, 13, -9, 4, 19}
for match in update_set_data_matches(data3, data4, range(1, len(data3)), sum):
print(f'{match=}', f'{sum(match)=}', f'{data3=}', f'{data4=}')
Result:
match=(5, 8) sum(match)=13 data3={66, -7, -3, -2} data4={0, 4, 19, -9}
match=(-7, -2) sum(match)=-9 data3={66, -3} data4={0, 4, 19}
match=(5, -3, -2) sum(match)=0 data3={66} data4={4, 19}
match=(5, 8, -7, -2) sum(match)=4 data3={66} data4={19}
Example 3:
data5 = [11, 123, 3, 66, -2, 11, 8, 66, 3.0, 3]
data6 = [0, 13, -9, 4.0, 123, 4, 19, 0]
for match in update_lst_data_matches(data5, data6, range(1, len(data5)), sum):
print(f'{match=}', f'{sum(match)=}', f'{data5=}', f'{data6=}')
Result:
match=(123,) sum(match)=123 data5=[11, 3, 66, -2, 11, 8, 66, 3.0, 3] data6=[0, 13, -9, 4.0, 4, 19, 0]
match=(11, 8) sum(match)=19 data5=[3, 66, -2, 66, 3.0, 3] data6=[0, 13, -9, 4.0, 4, 0]
match=(11, 8) sum(match)=19 data5=[3, 66, -2, 66, 3.0, 3] data6=[0, 13, -9, 4.0, 4, 0]
match=(3, -2, 3.0) sum(match)=4.0 data5=[66, 66] data6=[0, 13, -9, 0]
match=(3, -2, 3) sum(match)=4 data5=[66, 66] data6=[0, 13, -9, 0]
match=(-2, 3.0, 3) sum(match)=4.0 data5=[66, 66] data6=[0, 13, -9, 0]

Related

How to write a function that accepts two tuples, and returns the merged tuple in which all integers appears in ascending order?

Write a function merge(tup1, tup2) that accepts two sorted tuples as parameters, and returns the merged tuple in which all integers appear in ascending order.
You may assume that:
tup1 and tup2 each contain distinct integers sorted in ascending order.
Integers in tup1 are different from those in tup2.
Length of tuples may also vary.
I can't use Python's sorting function.
I've tried something like this, but failed public test cases such as:
merge((-1, 1, 3, 5), (-2, 4, 6, 7)) → (-2, -1, 1, 3, 4, 5, 6, 7)
merge((-3, 8, 67, 100, 207), (-10, 20, 30, 40, 65, 80, 90)) → (-10, -3, 8, 20, 30, 40, 65, 67, 80, 90, 100, 207)
merge((-1, 1, 3, 5, 7, 9, 11), (-2, 0, 2, 4, 6)) → (-2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 9, 11)
def merge(tup1, tup2):
size_1 = len(tup1)
size_2 = len(tup2)
res = ()
i, j = 0, 0
while i < size_1 and j < size_2:
if tup1(i) < tup2(j):
res.append(tup1(i))
i += 1
else:
res.append(tup2(j))
j += 1
return res = res + tup1(i:) + tup2(j:)
Your code (algorithm) is fine, you just have a few syntax errors:
Indexing is done with [..], not (...).
You can either return or assign - not both.
And a logical (attribute) error:
Tuples don't have append - you can concatenate them by using addition (or just by using lists and converting in the end to a tuple).
Your code with the syntax errors fixed and using lists seems to work fine:
def merge(tup1, tup2):
size_1 = len(tup1)
size_2 = len(tup2)
res = []
i, j = 0, 0
while i < size_1 and j < size_2:
if tup1[i] < tup2[j]:
res.append(tup1[i])
i += 1
else:
res.append(tup2[j])
j += 1
res.extend(tup1[i:])
res.extend(tup2[j:])
return res
Unpack both tuples into a list using *operator, sort and convert to tuple.
merge = lambda t1, t2: tuple(sorted([*t1, *t2]))
Since tuples are immutable in Python, I would loop over them to copy the items one by one into a list, sort that list with .sort(), and then convert it into a tuple.

Python: Extracting lists from list with module or regular expression

I'm trying to extract lists/sublists from one bigger integer-list with Python2.7 by using start- and end-patterns. I would like to do it with a function, but I cant find a library, algorithm or a regular expression for solving this problem.
def myFunctionForSublists(data, startSequence, endSequence):
# ... todo
data = [99, 99, 1, 2, 3, 99, 99, 99, 4, 5, 6, 99, 99, 1, 2, 3, 99, 4, 5, 6, 99]
startSequence = [1,2,3]
endSequence = [4,5,6]
sublists = myFunctionForSublists(data, startSequence, endSequence)
print sublists[0] # [1, 2, 3, 99, 99, 99, 4, 5, 6]
print sublists[1] # [1, 2, 3, 99, 4, 5, 6]
Any ideas how I can realize it?
Here's a more general solution that doesn't require the lists being sliceable, so you can use it on other iterables, like generators.
We keep a deque the size of the start sequence until we come across it. Then we add those values to a list, and keep iterating over the sequence. As we do, we keep a deque the size of the end sequence, until we see it, also adding the elements to the list we're keeping. If we come across the end sequence, we yield that list and set the deque up to scan for the next start sequence.
from collections import deque
def gen(l, start, stop):
start_deque = deque(start)
end_deque = deque(stop)
curr_deque = deque(maxlen=len(start))
it = iter(l)
for c in it:
curr_deque.append(c)
if curr_deque == start_deque:
potential = list(curr_deque)
curr_deque = deque(maxlen=len(stop))
for c in it:
potential.append(c)
curr_deque.append(c)
if curr_deque == end_deque:
yield potential
curr_deque = deque(maxlen=len(start))
break
print(list(gen([99, 99, 1, 2, 3, 99, 99, 99, 4, 5, 6, 99, 99, 1, 2, 3, 99, 4, 5, 6, 99], [1,2,3], [4,5,6])))
# [[1, 2, 3, 99, 99, 99, 4, 5, 6], [1, 2, 3, 99, 4, 5, 6]]
Here is an itertools approach that uses a collections.deque of limited length to keep a buffer of the last elements of appropriate size. It assumes that your sublists don't overlap and that your start and end sequences don't overlap either.
It works for any sequence for data, start, end (even generators).
from collections import deque
from itertools import islice
def sublists(data, start, end):
it = iter(data)
start, end = deque(start), deque(end)
while True:
x = deque(islice(it, len(start)), len(start))
# move forward until start is found
while x != start:
x.append(next(it))
out = list(x)
x = deque(islice(it, len(end)), len(end))
# move forward until end is found, storing the sublist
while x != end:
out.append(x[0])
x.append(next(it))
out.extend(end)
yield out
data = [99, 99, 1, 2, 3, 99, 99, 99, 4, 5, 6, 99, 99, 1, 2, 3, 99, 4, 5, 6, 99]
startSequence = [1,2,3]
endSequence = [4,5,6]
print(list(sublists(data, startSequence, endSequence)))
# [[1, 2, 3, 99, 99, 99, 4, 5, 6], [1, 2, 3, 99, 4, 5, 6]]
If you really want to use regular expressions, you can change the lists of integers to strings and use the regex that way
import re
def find_span(numbers, start, end):
# Create strings from the start and end lists.
start_pattern = ''.join(map(chr, start))
end_pattern = ''.join(map(chr, end))
# convert the list to search into one string.
s = ''.join(map(chr, numbers))
# Create a pattern that starts and ends with the correct sublists,
# and match all sublists. Then convert each match back to a list of
# integers
# The '?' is to make the regex non-greedy
return [
[ord(c) for c in match]
for match in re.findall(rf'{start_pattern}.*?{end_pattern}', s, re.DOTALL)
]
>>> find_span(search, start, end) # Using OP's sample values
[[1, 2, 3, 99, 99, 99, 4, 5, 6], [1, 2, 3, 99, 4, 5, 6]]
Note this is not really efficient, since it requires dynamically building a regex each time it's called. And you need to use re.DOTALL because otherwise it won't match anything containing 10 (which is the ascii encoding of newline). However, if you really want to use regexes, this would work.
Just iterate all in indices in the list and compare the slice to the startSequence or the endSequence, respectively. Assuming that the sublists are not supposed to overlap, you can use the same iterator for both loops.
def myFunctionForSublists(data, startSequence, endSequence):
positions = iter(range(len(data)))
for start in positions:
if data[start:start+len(startSequence)] == startSequence:
for end in positions:
if data[end:end+len(endSequence)] == endSequence:
yield data[start:end+len(endSequence)]
break
This way, the start loop will continue where the end loop left. If they can overlap, use two separate iterators for the loop, i.e. for start in range(len(data)): and for end in range(start+1, len(data)):
Use below method:
def find_sub_list(sl,l):
sll=len(sl)
for ind in (i for i,e in enumerate(l) if e==sl[0]):
if l[ind:ind+sll]==sl:
return ind,ind+sll-1
find_sub_list([1,2,3], data)
>>>(2, 4)
find_sub_list([4,5,6], data)
>>>(8, 10)
data[2:10+1]
>>>[1, 2, 3, 99, 99, 99, 4, 5, 6]
You can follow similar approach for sublists[1]
Courtesy : find-starting-and-ending-indices-of-sublist-in-list
Here is a O(n) solution that finds matches by keeping track of matching patterns of startSequence and endSequence
def myFunctionForSublists(data, startSequence, endSequence):
start,end = tuple(startSequence), tuple(endSequence)
l1, l2 = len(start), len(end)
s = -1
result = []
for i,v in enumerate(zip(*[data[i:] for i in range(0,l1)])):
if v == start:
s = i
if v == end and s != -1:
result.append(data[s:i+l2])
s = -1
return result
print (myFunctionForSublists(data, startSequence, endSequence))
# [[1, 2, 3, 99, 99, 99, 4, 5, 6], [1, 2, 3, 99, 4, 5, 6]]

Python - summing and grouping through a list

I have a big list of numbers like so:
a = [133000, 126000, 123000, 108000, 96700, 96500, 93800,
93200, 92100, 90000, 88600, 87000, 84300, 82400, 80700,
79900, 79000, 78800, 76100, 75000, 15300, 15200, 15100,
8660, 8640, 8620, 8530, 2590, 2590, 2580, 2550, 2540, 2540,
2510, 2510, 1290, 1280, 1280, 1280, 1280, 951, 948, 948,
947, 946, 945, 609, 602, 600, 599, 592, 592, 592, 591, 583]
What I want to do is cycle through this list one by one checking if a value is above a certain threshold (for example 40000). If it is above this threshold we put that value in a new list and forget about it. Otherwise we wait until the sum of the values is above the threshold and when it is we put the values in a list and then continue cycling. At the end, if the final values don't sum to the threshold we just add them to the last list.
If I'm not being clear consider the simple example, with the threshold being 15
[20, 10, 9, 8, 8, 7, 6, 2, 1]
The final list should look like this:
[[20], [10, 9], [8, 8], [7, 6, 2, 1]]
I'm really bad at maths and python and I'm at my wits end. I have some basic code I came up with but it doesn't really work:
def sortthislist(list):
list = a
newlist = []
for i in range(len(list)):
while sum(list[i]) >= 40000:
newlist.append(list[i])
return newlist
Any help at all would be greatly appreciated. Sorry for the long post.
The function below will accept your input list and some limit to check and then output the sorted list:
a = [20, 10, 9, 8, 8, 7, 6, 2, 1]
def func(a, lim):
out = []
temp = []
for i in a:
if i > lim:
out.append([i])
else:
temp.append(i)
if sum(temp) > lim:
out.append(temp)
temp = []
return out
print(func(a, 15))
# [[20], [10, 9], [8, 8], [7, 6, 2, 1]]
With Python you can iterate over the list itself, rather than iterating over it's indices, as such you can see that I use for i in a rather than for i in range(len(a)).
Within the function out is the list that you want to return at the end; temp is a temporary list that is populated with numbers until the sum of temp exceeds your lim value, at which point this temp is then appended to out and replaced with an empty list.
def group(L, threshold):
answer = []
start = 0
sofar = L[0]
for i,num in enumerate(L[1:],1):
if sofar >= threshold:
answer.append(L[start:i])
sofar = L[i]
start = i
else:
sofar += L[i]
if i<len(L) and sofar>=threshold:
answer.append(L[i:])
return answer
Output:
In [4]: group([20, 10, 9, 8, 8, 7, 6, 2, 1], 15)
Out[4]: [[20], [10, 9], [8, 8], [7, 6, 2]]
Hope this will help :)
vlist = [20, 10,3,9, 7,6,5,4]
thresold = 15
result = []
tmp = []
for v in vlist:
if v > thresold:
tmp.append(v)
result.append(tmp)
tmp = []
elif sum(tmp) + v > thresold:
tmp.append(v)
result.append(tmp)
tmp = []
else:
tmp.append(v)
if tmp != []:
result.append(tmp)
Here what's the result :
[[20], [10, 3, 9], [7, 6, 5], [4]]
Here's yet another way:
def group_by_sum(a, lim):
out = []
group = None
for i in a:
if group is None:
group = []
out.append(group)
group.append(i)
if sum(group) > lim:
group = None
return out
print(group_by_sum(a, 15))
We already have plenty of working answers, but here are two other approaches.
We can use itertools.groupby to collect such groups, given a stateful accumulator that understands the contents of the group. We end up with a set of (key,group) pairs, so some additional filtering gets us only the groups. Additionally since itertools provides iterators, we convert them to lists for printing.
from itertools import groupby
class Thresholder:
def __init__(self, threshold):
self.threshold=threshold
self.sum=0
self.group=0
def __call__(self, value):
if self.sum>self.threshold:
self.sum=value
self.group+=1
else:
self.sum+=value
return self.group
print [list(g) for k,g in groupby([20, 10, 9, 8, 8, 7, 6, 2, 1], Thresholder(15))]
The operation can also be done as a single reduce call:
def accumulator(result, value):
last=result[-1]
if sum(last)>threshold:
result.append([value])
else:
last.append(value)
return result
threshold=15
print reduce(accumulator, [20, 10, 9, 8, 8, 7, 6, 2, 1], [[]])
This version scales poorly to many values due to the repeated call to sum(), and the global variable for the threshold is rather clumsy. Also, calling it for an empty list will still leave one empty group.
Edit: The question logic demands that values above the threshold get put in their own groups (not sharing with collected smaller values). I did not think of that while writing these versions, but the accepted answer by Ffisegydd handles it. There is no effective difference if the input data is sorted in descending order, as all the sample data appears to be.

Traversing a sequence of generators

I have a sequence of generators: (gen_0, gen_1, ... gen_n)
These generators will create their values lazily but are finite and will have potentially different lengths.
I need to be able to construct another generator that yields the first element of each generator in order, followed by the second and so forth, skipping values from generators that have been exhausted.
I think this problem is analogous to taking the tuple
((1, 4, 7, 10, 13, 16), (2, 5, 8, 11, 14), (3, 6, 9, 12, 15, 17, 18))
and traversing it so that it would yield the numbers from 1 through 18 in order.
I'm working on solving this simple example using (genA, genB, genC) with genA yielding values from (1, 4, 7, 10, 13, 16), genB yielding (2, 5, 8, 11, 14) and genC yielding (3, 6, 9, 12, 15, 17, 18).
To solve the simpler problem with the tuple of tuples the answer is fairly simple if the
elements of the tuple were the same length. If the variable 'a' referred to the tuple, you could use
[i for t in zip(*a) for i in t]
Unfortunately the items are not necessarily the same length and the zip trick doesn't seem to work for generators anyway.
So far my code is horribly ugly and I'm failing to find anything approaching a clean solution. Help?
I think you need itertools.izip_longest
>>> list([e for e in t if e is not None] for t in itertools.izip_longest(*some_gen,
fillvalue=None))
[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12], [13, 14, 15], [16, 17], [18]]
>>>
If you look at the documentation for itertools.izip_longest, you'll see that it gives a pure-Python implementation. It's easy to modify this implementation so that it produces the results you need instead (that is, just like izip_longest, but without any fillvalue):
class ZipExhausted(Exception):
pass
def izip_longest_nofill(*args):
"""
Return a generator whose .next() method returns a tuple where the
i-th element comes from the i-th iterable argument that has not
yet been exhausted. The .next() method continues until all
iterables in the argument sequence have been exhausted and then it
raises StopIteration.
>>> list(izip_longest_nofill(*[xrange(i,2*i) for i in 2,3,5]))
[(2, 3, 5), (3, 4, 6), (5, 7), (8,), (9,)]
"""
iterators = map(iter, args)
def zip_next():
i = 0
while i < len(iterators):
try:
yield next(iterators[i])
i += 1
except StopIteration:
del iterators[i]
if i == 0:
raise ZipExhausted
try:
while iterators:
yield tuple(zip_next())
except ZipExhausted:
pass
This avoids the need to re-filter the output of izip_longest to discard the fillvalues. Alternatively, if you want a "flattened" output:
def iter_round_robin(*args):
"""
Return a generator whose .next() method cycles round the iterable
arguments in turn (ignoring ones that have been exhausted). The
.next() method continues until all iterables in the argument
sequence have been exhausted and then it raises StopIteration.
>>> list(iter_round_robin(*[xrange(i) for i in 2,3,5]))
[0, 0, 0, 1, 1, 1, 2, 2, 3, 4]
"""
iterators = map(iter, args)
while iterators:
i = 0
while i < len(iterators):
try:
yield next(iterators[i])
i += 1
except StopIteration:
del iterators[i]
Another itertools option if you want them all collapsed in a single list; this (as #gg.kaspersky already pointed out in another thread) does not handle generated None values.
g = (generator1, generator2, generator3)
res = [e for e in itertools.chain(*itertools.izip_longest(*g)) if e is not None]
print res
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]
You might consider itertools.izip_longest, but in case None is a valid value, that solution will fail. Here is a sample "another generator", which does exactly what you asked for, and is pretty clean:
def my_gen(generators):
while True:
rez = ()
for gen in generators:
try:
rez = rez + (gen.next(),)
except StopIteration:
pass
if rez:
yield rez
else:
break
print [x for x in my_gen((iter(xrange(2)), iter(xrange(3)), iter(xrange(1))))]
[(0, 0, 0), (1, 1), (2,)] #output

Iteration over list slices

I want an algorithm to iterate over list slices. Slices size is set outside the function and can differ.
In my mind it is something like:
for list_of_x_items in fatherList:
foo(list_of_x_items)
Is there a way to properly define list_of_x_items or some other way of doing this using python 2.5?
edit1: Clarification Both "partitioning" and "sliding window" terms sound applicable to my task, but I am no expert. So I will explain the problem a bit deeper and add to the question:
The fatherList is a multilevel numpy.array I am getting from a file. Function has to find averages of series (user provides the length of series) For averaging I am using the mean() function. Now for question expansion:
edit2: How to modify the function you have provided to store the extra items and use them when the next fatherList is fed to the function?
for example if the list is lenght 10 and size of a chunk is 3, then the 10th member of the list is stored and appended to the beginning of the next list.
Related:
What is the most “pythonic” way to iterate over a list in chunks?
If you want to divide a list into slices you can use this trick:
list_of_slices = zip(*(iter(the_list),) * slice_size)
For example
>>> zip(*(iter(range(10)),) * 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
If the number of items is not dividable by the slice size and you want to pad the list with None you can do this:
>>> map(None, *(iter(range(10)),) * 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
It is a dirty little trick
OK, I'll explain how it works. It'll be tricky to explain but I'll try my best.
First a little background:
In Python you can multiply a list by a number like this:
[1, 2, 3] * 3 -> [1, 2, 3, 1, 2, 3, 1, 2, 3]
([1, 2, 3],) * 3 -> ([1, 2, 3], [1, 2, 3], [1, 2, 3])
And an iterator object can be consumed once like this:
>>> l=iter([1, 2, 3])
>>> l.next()
1
>>> l.next()
2
>>> l.next()
3
The zip function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. For example:
zip([1, 2, 3], [20, 30, 40]) -> [(1, 20), (2, 30), (3, 40)]
zip(*[(1, 20), (2, 30), (3, 40)]) -> [[1, 2, 3], [20, 30, 40]]
The * in front of zip used to unpack arguments. You can find more details here.
So
zip(*[(1, 20), (2, 30), (3, 40)])
is actually equivalent to
zip((1, 20), (2, 30), (3, 40))
but works with a variable number of arguments
Now back to the trick:
list_of_slices = zip(*(iter(the_list),) * slice_size)
iter(the_list) -> convert the list into an iterator
(iter(the_list),) * N -> will generate an N reference to the_list iterator.
zip(*(iter(the_list),) * N) -> will feed those list of iterators into zip. Which in turn will group them into N sized tuples. But since all N items are in fact references to the same iterator iter(the_list) the result will be repeated calls to next() on the original iterator
I hope that explains it. I advice you to go with an easier to understand solution. I was only tempted to mention this trick because I like it.
If you want to be able to consume any iterable you can use these functions:
from itertools import chain, islice
def ichunked(seq, chunksize):
"""Yields items from an iterator in iterable chunks."""
it = iter(seq)
while True:
yield chain([it.next()], islice(it, chunksize-1))
def chunked(seq, chunksize):
"""Yields items from an iterator in list chunks."""
for chunk in ichunked(seq, chunksize):
yield list(chunk)
Use a generator:
big_list = [1,2,3,4,5,6,7,8,9]
slice_length = 3
def sliceIterator(lst, sliceLen):
for i in range(len(lst) - sliceLen + 1):
yield lst[i:i + sliceLen]
for slice in sliceIterator(big_list, slice_length):
foo(slice)
sliceIterator implements a "sliding window" of width sliceLen over the squence lst, i.e. it produces overlapping slices: [1,2,3], [2,3,4], [3,4,5], ... Not sure if that is the OP's intention, though.
Do you mean something like:
def callonslices(size, fatherList, foo):
for i in xrange(0, len(fatherList), size):
foo(fatherList[i:i+size])
If this is roughly the functionality you want you might, if you desire, dress it up a bit in a generator:
def sliceup(size, fatherList):
for i in xrange(0, len(fatherList), size):
yield fatherList[i:i+size]
and then:
def callonslices(size, fatherList, foo):
for sli in sliceup(size, fatherList):
foo(sli)
Answer to the last part of the question:
question update: How to modify the
function you have provided to store
the extra items and use them when the
next fatherList is fed to the
function?
If you need to store state then you can use an object for that.
class Chunker(object):
"""Split `iterable` on evenly sized chunks.
Leftovers are remembered and yielded at the next call.
"""
def __init__(self, chunksize):
assert chunksize > 0
self.chunksize = chunksize
self.chunk = []
def __call__(self, iterable):
"""Yield items from `iterable` `self.chunksize` at the time."""
assert len(self.chunk) < self.chunksize
for item in iterable:
self.chunk.append(item)
if len(self.chunk) == self.chunksize:
# yield collected full chunk
yield self.chunk
self.chunk = []
Example:
chunker = Chunker(3)
for s in "abcd", "efgh":
for chunk in chunker(s):
print ''.join(chunk)
if chunker.chunk: # is there anything left?
print ''.join(chunker.chunk)
Output:
abc
def
gh
I am not sure, but it seems you want to do what is called a moving average. numpy provides facilities for this (the convolve function).
>>> x = numpy.array(range(20))
>>> x
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19])
>>> n = 2 # moving average window
>>> numpy.convolve(numpy.ones(n)/n, x)[n-1:-n+1]
array([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5,
9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5])
The nice thing is that it accomodates different weighting schemes nicely (just change numpy.ones(n) / n to something else).
You can find a complete material here:
http://www.scipy.org/Cookbook/SignalSmooth
Expanding on the answer of #Ants Aasma: In Python 3.7 the handling of the StopIteration exception changed (according to PEP-479). A compatible version would be:
from itertools import chain, islice
def ichunked(seq, chunksize):
it = iter(seq)
while True:
try:
yield chain([next(it)], islice(it, chunksize - 1))
except StopIteration:
return
Your question could use some more detail, but how about:
def iterate_over_slices(the_list, slice_size):
for start in range(0, len(the_list)-slice_size):
slice = the_list[start:start+slice_size]
foo(slice)
For a near-one liner (after itertools import) in the vein of Nadia's answer dealing with non-chunk divisible sizes without padding:
>>> import itertools as itt
>>> chunksize = 5
>>> myseq = range(18)
>>> cnt = itt.count()
>>> print [ tuple(grp) for k,grp in itt.groupby(myseq, key=lambda x: cnt.next()//chunksize%2)]
[(0, 1, 2, 3, 4), (5, 6, 7, 8, 9), (10, 11, 12, 13, 14), (15, 16, 17)]
If you want, you can get rid of the itertools.count() requirement using enumerate(), with a rather uglier:
[ [e[1] for e in grp] for k,grp in itt.groupby(enumerate(myseq), key=lambda x: x[0]//chunksize%2) ]
(In this example the enumerate() would be superfluous, but not all sequences are neat ranges like this, obviously)
Nowhere near as neat as some other answers, but useful in a pinch, especially if already importing itertools.
A function that slices a list or an iterator into chunks of a given size. Also handles the case correctly if the last chunk is smaller:
def slice_iterator(data, slice_len):
it = iter(data)
while True:
items = []
for index in range(slice_len):
try:
item = next(it)
except StopIteration:
if items == []:
return # we are done
else:
break # exits the "for" loop
items.append(item)
yield items
Usage example:
for slice in slice_iterator([1,2,3,4,5,6,7,8,9,10],3):
print(slice)
Result:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]

Categories

Resources