Efficiently slice iterable by another iterable - python

So I'm trying to slice an iterable to the same length of another iterable. For context I was answering this question to get the sum of values grouped by key essentially and I think I can do this more efficiently
from itertools import groupby
x = [(5, 65), (2, 12), (5, 18), (3, 35), (4, 49), (4, 10), (1, 27), (1, 1), (4, 71), (2, 41), (2, 17), (1, 25), (2, 62), (5, 65), (4, 5), (1, 51), (1, 13), (5, 92), (2, 62), (5, 81)]
keys, values = map(iter, zip(*sorted(x)))
print([sum(next(values) for _ in g) for _, g in groupby(keys)])
#[117, 194, 35, 135, 321]
I believe the next(values) for _ in g can be done functionally or more concisely. Essentially in pseudocode:
#from this
sum(next(values) for _ in g)
#to this
sum(values[length of g])
I know the above won't work but all I can think of is using zip because it only iterates to the end of the smallest iterable. Although, when I tried that it's consuming more than the group is long. (Also it isn't very readable) See below what I tried:
print([sum(next(zip(*zip(values, g)))) for _, g in groupby(keys)])
#[117, 217, 10, 219, 92]
I've tried searching for this with no results unless I'm not searching the right thing.
I've thought of other solutions such as using islice but I would need the length of g and thats another messy solution. Another being I could just use operator.itemgetter but if I could figure out how to do what I am doing more concisely then maybe I can use it in other solutions too.

You don't have to separate the keys and values at all. It can be handled by the key functions:
from operator import itemgetter as ig
[sum(map(ig(1), g)) for _, g in groupby(sorted(x), key=ig(0))]

You could use ilen from more-itertools and then islice:
[sum(islice(values, ilen(g))) for _, g in groupby(keys)]
Or with zip, but the group first:
[sum(x for _, x in zip(g, values)) for _, g in groupby(keys)]
Don't know how "efficient" these are for you, as you only showed very small data and I'm not sure how you'd generalize it (in particular, how long your groups are).

Maybe what you are asking can be accomplished by the following class. The class groupby_other takes an iterable it1 and an iterable of iterables it2: it2_0, it2_1, ... and yields groups of elements from it1 with lengths equal to it2_0, it2_1 and so on, until one of it1 or it2 is exhausted.
class groupby_other:
"""
Make an iterator that returns groups from iterable1. Each group has the same
number of elements as each element in iterable2.
>>> y = [1,2,3,4,5]
>>> z = [[1,1], [4,4,4]]
>>> [list(x) for x in groupby_other(y,z)]
[[1, 2], [3, 4, 5]]
Grouping is terminated when one of the iterables is exhausted
>>> y = [1,2,3,4,5]
>>> z = [[1,1], [4,4,4], [4,5]]
>>> [list(x) for x in groupby_other(y,z)]
[[1, 2], [3, 4, 5]]
>>> z = [[1,1], [4,4,4]]
>>> [list(x) for x in groupby_other(y,z)]
[[1, 2], [3, 4, 5]]
>>> z = [[1,1], [4,4]]
>>> [list(x) for x in groupby_other(y,z)]
[[1, 2], [3, 4]]
>>> [list(x) for x in groupby_other([],z)]
[]
>>> [list(x) for x in groupby_other(y,[])]
[]
"""
def __init__(self, iterable1, iterable2, key=None):
self.it1 = iter(iterable1)
self.it2 = iter(iterable2)
self.current_it = None
self.done = False
def __iter__(self):
return self
def __next__(self):
if self.done:
raise StopIteration
current_group = iter(next(self.it2)) # Exit on StopIteration
current_item = next(self.it1)
return self._grouper(current_item, current_group)
def _grouper(self, current_item, current_group):
try:
next(current_group)
yield current_item
for _ in current_group:
yield next(self.it1)
except StopIteration:
self.done=True
return
Then you can do:
>>> [sum(x) for x in groupby_other(values, (g for _, g in groupby(keys)))]
[117, 194, 35, 135, 321]

Related

Is there a pythonic way to iterate over two lists one element at a time?

I have two lists: [1, 2, 3] and [10, 20, 30]. Is there a way to iterate moving one element in each list in each step? Ex
(1, 10)
(1, 20)
(2, 20)
(2, 30)
(3, 30)
I know zip moves one element in both lists in each step, but that's not what I'm looking for
Is it what you expect:
def zip2(l1, l2):
for i, a in enumerate(l1):
for b in l2[i:i+2]:
yield (a, b)
>>> list(zip2(l1, l2))
[(1, 10), (1, 20), (2, 20), (2, 30), (3, 30)]
def dupe(l):
return [val for val in l for _ in (0,1)]
list(zip(dupe([1,2,3]), dupe([10,20,30])[1:]))
# [(1, 10), (1, 20), (2, 20), (2, 30), (3, 30)]
One with zip and list comprehension.
For good measure, here's a solution that works with arbitrary iterables, not just indexable sequences:
def frobnicate(a, b):
ita, itb = iter(a), iter(b)
flip = False
EMPTY = object()
try:
x, y = next(ita), next(itb)
yield x, y
except StopIteration:
return
while True:
flip = not flip
if flip:
current = y = next(itb, EMPTY)
else:
current = x = next(ita, EMPTY)
if current is EMPTY:
return
yield x, y

How can I remove duplicate tuples from a list based on index value of tuple while maintaining the order of tuple? [duplicate]

This question already has answers here:
How do I remove duplicates from a list, while preserving order?
(30 answers)
Closed 4 years ago.
I want to remove those tuples which had same values at index 0 except the first occurance. I looked at other similar questions but did not get a particular answer I am looking for. Can somebody please help me?
Below is what I tried.
from itertools import groupby
import random
Newlist = []
abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]
Newlist = [random.choice(tuple(g)) for _, g in groupby(abc, key=lambda x: x[0])]
print Newlist
my expected output : [(1,2,3), (2,3,4), (0,2,0), (5,4,3)]
A simple way is to loop over the list and keep track of which elements you've already found:
abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]
found = set()
NewList = []
for a in abc:
if a[0] not in found:
NewList.append(a)
found.add(a[0])
print(NewList)
#[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
found is a set. At each iteration we check if the first element in the tuple is already in found. If not, we append the whole tuple to NewList. At the end of each iteration we add the first element of the tuple to found.
A better alternative using OrderedDict:
from collections import OrderedDict
abc = [(1,2,3), (2,3,4), (1,0,3), (0,2,0), (2,4,5),(5,4,3), (0,4,1)]
d = OrderedDict()
for t in abc:
d.setdefault(t[0], t)
abc_unique = list(d.values())
print(abc_unique)
Output:
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
Simple although not very efficient:
abc = [(1,2,3), (2,3,4), (1,0,3), (0,2,0), (2,4,5),(5,4,3), (0,4,1)]
abc_unique = [t for i, t in enumerate(abc) if not any(t[0] == p[0] for p in abc[:i])]
print(abc_unique)
Output:
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
The itertools recipes (Python 2: itertools recipes, but basically no difference in this case) contains a recipe for this, which is a bit more general than the implementation by #pault. It also uses a set:
Python 2:
from itertools import ifilterfalse as filterfalse
Python 3:
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Use it with:
abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]
Newlist = list(unique_everseen(abc, key=lambda x: x[0]))
print Newlist
# [(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
This should be slightly faster because of the caching of the set.add method (only really relevant if your abc is large) and should also be more general because it makes the key function a parameter.
Apart from that, the same limitation I already mentioned in a comment applies: this only works if the first element of the tuple is actually hashable (which numbers, like in the given example, are, of course).
#PatrickHaugh claims:
but the question is explicitly about maintaining the order of the
tuples. I don't think there's a solution using groupby
I never miss an opportunity to (ab)use groupby(). Here's my solution sans sorting (once or twice):
from itertools import groupby, chain
abc = [(1, 2, 3), (2, 3, 4), (1, 0, 3), (0, 2, 0), (2, 4, 5), (5, 4, 3), (0, 4, 1)]
Newlist = list((lambda s: chain.from_iterable(g for f, g in groupby(abc, lambda k: s.get(k[0]) != s.setdefault(k[0], True)) if f))({}))
print(Newlist)
OUTPUT
% python3 test.py
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
%
To use groupby correctly, the sequence must be sorted:
>>> [next(g) for k,g in groupby(sorted(abc, key=lambda x:x[0]), key=lambda x:x[0])]
[(0, 2, 0), (1, 2, 3), (2, 3, 4), (5, 4, 3)]
or if you need that very exact order of your example (i.e. maintaining original order):
>>> [t[2:] for t in sorted([next(g) for k,g in groupby(sorted([(t[0], i)+t for i,t in enumerate(abc)]), lambda x:x[0])], key=lambda x:x[1])]
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
the trick here is to add one field for keeping the original order to restore after the groupby() step.
Edit: even a bit shorter:
>>> [t[1:] for t in sorted([next(g)[1:] for k,g in groupby(sorted([(t[0], i)+t for i,t in enumerate(abc)]), lambda x:x[0])])]
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]

Python: If statement inside list comprehension on a generator

Python 3.6
Consider this code:
from itertools import groupby
result = [list(group) for key, group in groupby(range(5,15), key= lambda x: str(x)[0])]
print(result)
outputs:
[[5], [6], [7], [8], [9], [10, 11, 12, 13, 14]]
Can I filter out the lists with len < 2 inside the list comprehension?
Update:
Due to the two excellent answers given. I felt it might be worth a bench mark
import timeit
t1 = timeit.timeit('''
from itertools import groupby
result = [group_list for group_list in (list(group) for key, group in groupby(range(5,15), key= lambda x: str(x)[0])) if len(group_list) >= 2]
''', number=1000000)
print(t1)
t2 = timeit.timeit('''
from itertools import groupby
list(filter(lambda group: len(group) >= 2, map(lambda key_group: list(key_group[1]),groupby(range(5,15), key=lambda x: str(x)[0]))))
''', number=1000000)
print(t2)
Results:
8.74591397369441
9.647086477861325
Looks like the list comprehension has an edge.
Yes
A list comprehension consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it. For example, this listcomp combines the elements of two lists if they are not equal:
>>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y]
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]
and it’s equivalent to:
>>> combs = []
>>> for x in [1,2,3]:
... for y in [3,1,4]:
... if x != y:
... combs.append((x, y))
...
>>> combs
[(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)]
Note how the order of the for and if statements is the same in both these snippets.
Since calling list(group) twice doesn't work in your particular example (as it consumes the generator yielded by groupby), you can introduce a temporary variable in your list comprehension by using a generator expression:
>>> [group_list for group_list in (list(group) for key, group in groupby(range(5,15), key= lambda x: str(x)[0])) if len(group_list) >= 2]
[[10, 11, 12, 13, 14]]
Alternately, using filter, map, and list:
>>> list(\
... filter(lambda group: len(group) >= 2,\
... map(lambda key_group: list(key_group[1]),\
... groupby(range(5,15), key=lambda x: str(x)[0])\
... )\
... )\
... )
[[10, 11, 12, 13, 14]]

Attempting Python list comprehension with two variable of different ranges

I'm trying to generate a list quickly with content from two different arrays of size n and n/2. As an example:
A = [70, 60, 50, 40, 30, 20, 10, 0]
B = [1, 2, 3, 4]
I wish to generate something like
[(A[x], B[y]) for x in range(len(A)) for y in range(len(B))]
I understand the second for statement is the nested for loop after the "x" one. I'm trying to get the contents of the new array to be
A[0], B[0]
A[1], B[1]
A[2], B[2]
A[3], B[3]
A[4], B[0]
A[5], B[1]
A[6], B[2]
A[7], B[3]
Could anyone point me in the right direction?
Don't use nested loops; you are pairing up A and B, with B repeating as needed. What you need is zip() (to do the pairing), and itertools.cycle() (to repeat B):
from itertools import cycle
zip(A, cycle(B))
If B is always going to be half the size of A, you could also just double B:
zip(A, B + B)
Demo:
>>> from itertools import cycle
>>> A = [70, 60, 50, 40, 30, 20, 10, 0]
>>> B = [1, 2, 3, 4]
>>> zip(A, cycle(B))
[(70, 1), (60, 2), (50, 3), (40, 4), (30, 1), (20, 2), (10, 3), (0, 4)]
>>> zip(A, B + B)
[(70, 1), (60, 2), (50, 3), (40, 4), (30, 1), (20, 2), (10, 3), (0, 4)]
For cases where it is not known which one is the longer list, you could use min() and max() to pick which one to cycle:
zip(max((A, B), key=len), cycle(min((A, B), key=len))
or for an arbitrary number of lists to pair up, cycle them all but use itertools.islice() to limit things to the maximum length:
inputs = (A, B) # potentially more
max_length = max(len(elem) for elem in inputs)
zip(*(islice(cycle(elem), max_length) for elem in inputs))
Demo:
>>> from itertools import islice
>>> inputs = (A, B) # potentially more
>>> max_length = max(len(elem) for elem in inputs)
>>> zip(*(islice(cycle(elem), max_length) for elem in inputs))
[(70, 1), (60, 2), (50, 3), (40, 4), (30, 1), (20, 2), (10, 3), (0, 4)]
[(A[x % len(A)], B[x % len(B)]) for x in range(max(len(A), len(B)))]
This will work whether or not A is the larger list. :)
Try using only one for loop instead of two and having the second wrap back to 0 once it gets past its length.
[(A[x], B[x%len(B)]) for x in range(len(A))]
Note that this will only work if A is the longer list. If you know B will always be half the size of A you can also use this:
list(zip(A, B*2))

Accessing consecutive items when using a generator

Lets say I have a tuple generator, which I simulate as follows:
g = (x for x in (1,2,3,97,98,99))
For this specific generator, I wish to write a function to output the following:
(1,2,3)
(2,3,97)
(3,97,98)
(97,98,99)
(98,99)
(99)
So I'm iterating over three consecutive items at a time and printing them, except when I approach the end.
Should the first line in my function be:
t = tuple(g)
In other words, is it best to work on a tuple directly or might it be beneficial to work with a generator. If it is possible to approach this problem using both methods, please state the benefits and disadvantages for both approaches. Also, if it might be wise to use the generator approach, how might such a solution look?
Here's what I currently do:
def f(data, l):
t = tuple(data)
for j in range(len(t)):
print(t[j:j+l])
data = (x for x in (1,2,3,4,5))
f(data,3)
UPDATE:
Note that I've updated my function to take a second argument specifying the length of the window.
A specific example for returning three items could read
def yield3(gen):
b, c = gen.next(), gen.next()
try:
while True:
a, b, c = b, c, gen.next()
yield (a, b, c)
except StopIteration:
yield (b, c)
yield (c,)
g = (x for x in (1,2,3,97,98,99))
for l in yield3(g):
print l
Actually there're functions for this in itertools module - tee() and izip_longest():
>>> from itertools import izip_longest, tee
>>> g = (x for x in (1,2,3,97,98,99))
>>> a, b, c = tee(g, 3)
>>> next(b, None)
>>> next(c, None)
>>> next(c, None)
>>> [[x for x in l if x is not None] for l in izip_longest(a, b, c)]
[(1, 2, 3), (2, 3, 97), (3, 97, 98), (97, 98, 99), (98, 99), (99)]
from documentation:
Return n independent iterators from a single iterable. Equivalent to:
def tee(iterable, n=2):
it = iter(iterable)
deques = [collections.deque() for i in range(n)]
def gen(mydeque):
while True:
if not mydeque: # when the local deque is empty
newval = next(it) # fetch a new value and
for d in deques: # load it to all the deques
d.append(newval)
yield mydeque.popleft()
return tuple(gen(d) for d in deques)
If you might need to take more than three elements at a time, and you don't want to load the whole generator into memory, I suggest using a deque from the collections module in the standard library to store the current set of items. A deque (pronounced "deck" and meaning "double-ended queue") can have values pushed and popped efficiently from both ends.
from collections import deque
from itertools import islice
def get_tuples(gen, n):
q = deque(islice(gen, n)) # pre-load the queue with `n` values
while q: # run until the queue is empty
yield tuple(q) # yield a tuple copied from the current queue
q.popleft() # remove the oldest value from the queue
try:
q.append(next(gen)) # try to add a new value from the generator
except StopIteration:
pass # but we don't care if there are none left
actually it depends.
A generator might be useful in case of very large collections, where you dont really need to store them all in memory to achieve the result you want.
On the other hand, you have to print it is seems safe to guess that the collection isn't huge, so it doesn make a difference.
However, this is a generator that achieve what you were looking for
def part(gen, size):
t = tuple()
try:
while True:
l = gen.next()
if len(t) < size:
t = t + (l,)
if len(t) == size:
yield t
continue
if len(t) == size:
t = t[1:] + (l,)
yield t
continue
except StopIteration:
while len(t) > 1:
t = t[1:]
yield t
>>> a = (x for x in range(10))
>>> list(part(a, 3))
[(0, 1, 2), (1, 2, 3), (2, 3, 4), (3, 4, 5), (4, 5, 6), (5, 6, 7), (6, 7, 8), (7, 8, 9), (8, 9), (9,)]
>>> a = (x for x in range(10))
>>> list(part(a, 5))
[(0, 1, 2, 3, 4), (1, 2, 3, 4, 5), (2, 3, 4, 5, 6), (3, 4, 5, 6, 7), (4, 5, 6, 7, 8), (5, 6, 7, 8, 9), (6, 7, 8, 9), (7, 8, 9), (8, 9), (9,)]
>>>
note: the code actually isn't very elegant but it works also when you have to split in, say, 5 pieces
It's definitely best to work with the generator because you don't want to have to hold everything in memory.
It can be done very simply with a deque.
from collections import deque
from itertools import islice
def overlapping_chunks(size, iterable, *, head=False, tail=False):
"""
Get overlapping subsections of an iterable of a specified size.
print(*overlapping_chunks(3, (1,2,3,97,98,99)))
#>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]
If head is given, the "warm up" before the specified maximum
number of items is included.
print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
#>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]
If head is truthy, the "warm up" before the specified maximum
number of items is included.
print(*overlapping_chunks(3, (1,2,3,97,98,99), head=True))
#>>> [1] [1, 2] [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99]
If tail is truthy, the "cool down" after the iterable is exhausted
is included.
print(*overlapping_chunks(3, (1,2,3,97,98,99), tail=True))
#>>> [1, 2, 3] [2, 3, 97] [3, 97, 98] [97, 98, 99] [98, 99] [99]
"""
chunker = deque(maxlen=size)
iterator = iter(iterable)
for item in islice(iterator, size-1):
chunker.append(item)
if head:
yield list(chunker)
for item in iterator:
chunker.append(item)
yield list(chunker)
if tail:
while len(chunker) > 1:
chunker.popleft()
yield list(chunker)
I think what you currently do seems a lot easier than any of the above. If there isn't any particular need to make it more complicated, my opinion would be to keep it simple. In other words, it is best to work on a tuple directly.
Here's a generator that works in both Python 2.7.17 and 3.8.1. Internally it uses iterators and generators whenever possible, so it should be relatively memory efficient.
try:
from itertools import izip, izip_longest, takewhile
except ImportError: # Python 3
izip = zip
from itertools import zip_longest as izip_longest, takewhile
def tuple_window(n, iterable):
iterators = [iter(iterable) for _ in range(n)]
for n, iterator in enumerate(iterators):
for _ in range(n):
next(iterator)
_NULL = object() # Unique singleton object.
for t in izip_longest(*iterators, fillvalue=_NULL):
yield tuple(takewhile(lambda v: v is not _NULL, t))
if __name__ == '__main__':
data = (1, 2, 3, 97, 98, 99)
for t in tuple_window(3, data):
print(t)
Output:
(1, 2, 3)
(2, 3, 97)
(3, 97, 98)
(97, 98, 99)
(98, 99)
(99,)

Categories

Resources