itertools.groupby: iterate over groups pairwise

itertools.groupby: iterate over groups pairwise - python

How can I iterate over groupby results in pairs? What I tried isn't quite working:
from itertools import groupby,izip
groups = groupby([(1,2,3),(1,2),(1,2),(3,4,5),(3,4)],key=len)
def grouped(iterable, n):
return izip(*[iterable]*n)
for g, gg in grouped(groups,2):
print list(g[1]), list(gg[1])
Output I get:
[] [(1, 2), (1, 2)]
[] [(3, 4)]
Output I would like to have:
[(1, 2, 3)] [(1, 2), (1, 2)]
[(3, 4, 5)] [(3, 4)]

import itertools as IT
groups = IT.groupby([(1,2,3),(1,2),(1,2),(3,4,5),(3,4)], key=len)
groups = (list(group) for key, group in groups)
def grouped(iterable, n):
return IT.izip(*[iterable]*n)
for p1, p2 in grouped(groups, 2):
print p1, p2
yields
[(1, 2, 3)] [(1, 2), (1, 2)]
[(3, 4, 5)] [(3, 4)]
The code you posted is very interesting. It has a mundane problem, and a subtle problem.
The mundane problem is that itertools.groupby returns an iterator which outputs both a key and a group on each iteration.
Since you are interested in only the groups, not the keys, you need something like
groups = (group for key, group in groups)
The subtle problem is more difficult to explain -- I'm not really sure I understand it fully. Here is my guess: The iterator returned by groupby has turned its input,
[(1,2,3),(1,2),(1,2),(3,4,5),(3,4)]
into an iterator. That the groupby iterator is wrapped around the underlying data iterator is analogous to how a csv.reader is wrapped around an underlying file object iterator. You get one pass through this iterator and one pass only. The itertools.izip function, in the process of pairing items in groups, causes the groups iterator to advance from the first item to the second. Since you only get one pass through the iterator, the first item has been consumed, so when you call list(g[1]) it is empty.
A not-so-satisfying fix to this problem is to convert the iterators in groups into lists:
groups = (list(group) for key, group in groups)
so itertools.izip will not prematurely consume them. Edit: On second thought, this fix is not so bad. groups remains an iterator, and only turns the group into a list as it is consumed.

When you try to look at the second key from the groupby, you are forcing it to iterate that far into the source iterator. Since there is normally nowhere to store the items from the first group, they are simply discarded.
So now we understand why we'll need to make sure we've stored the items from the first group before we try to look at the key (or the items) of the second group.
Some people are sure to hate this, but
>>> groups = groupby([(1, 2, 3), (1, 2), (1, 2), (3, 4, 5), (3, 4)], key=len)
>>> for i, j in ((list(i[1]), list(next(groups)[1])) for i in groups):
... print i, j
...
[(1, 2, 3)] [(1, 2), (1, 2)]
[(3, 4, 5)] [(3, 4)]

Related

Get list comprehension object

I'm committing a crime in python on purpose. This is a bad way of doing this.
The GOAL here is cursed code. All on one line.
I have basically whats below
with open("file") as f:
[int(x) for x in [y for y in f.read().split()]
I cannot use
with open("file") as f:
a = f.read().split()
[x for x in [(a[i-1, e]) for i, e in enumerate(a) if i > 0] ...]
because the goal is to have this in one line (aside from the with open)
I would like to return from the original object the current element and either the previous one or the next one.
To illustrate it clearly.
a = [1, 2, 3, 4, 5]
After the illegal code would return
[(1, 2), (2, 3), (3, 4), (4, 5), (5, ?)]
So again the focus here is not production code. This is purely to see how much we can abuse the language.
So far I've found https://code.activestate.com/recipes/204297/ which references the use of local in python2, after mucking around with it I found that the interface for it is a little different.
I've been able to get the object in memory but I dont know how to actually use this object now that I have it.
local()['.0']
Most attributes seem to be missing, no __self__ to call.
Please share your most cursed ideas for this.

Normally, I would use tee and islice on the generator for something like this:
from itertools import tee, islice
with open("file") as f:
a, b = tee(f.read().split())
b = islice(b, 1, None)
list(zip(a, b))
You can convert this into a one-liner using (abusing) the walrus operator (:=):
list(zip((k := tee(f.read().split()))[0], islice(k[1], 1, None)))
The result is
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6)]
If you want the last element to be padded, use zip_longest instead of zip:
from itertools import tee, islice, zip_longest
...
list(zip_longest((k := tee(f.read().split()))[0], islice(k[1], 1, None), fillvalue='?'))
The result in this case is
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 6), (6, '?')]
The nice thing about using iterators rather than lists this way is that while f.read().split() is a sequence of known length, the tee and islice will work on any iterable, even if the length is unknown.

Insert element to list based on previous and next elements

I'm trying to add a new tuple to a list of tuples (sorted by first element in tuple), where the new tuple contains elements from both the previous and the next element in the list.
Example:
oldList = [(3, 10), (4, 7), (5,5)]
newList = [(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
(4,10) was constructed from and added in between (3,10) and (4,7).
Construct (x,y) from (a,y) and (x,b)
I've tried using enumerate() to insert at the specific position, but that doesn't really let me access the next element.

oldList = [(3, 10), (4, 7), (5,5)]
def pair(lst):
# create two iterators
it1, it2 = iter(lst), iter(lst)
# move second to the second tuple
next(it2)
for ele in it1:
# yield original
yield ele
# yield first ele from next and first from current
yield (next(it2)[0], ele[1])
Which will give you:
In [3]: oldList = [(3, 10), (4, 7), (5, 5)]
In [4]: list(pair(oldList))
Out[4]: [(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
Obviously we need to do some error handling to handle different possible situations.
You could also do it using a single iterator if you prefer:
def pair(lst):
it = iter(lst)
prev = next(it)
for ele in it:
yield prev
yield (prev[0], ele[1])
prev = ele
yield (prev[0], ele[1])
You can use itertools.tee in place of calling iter:
from itertools import tee
def pair(lst):
# create two iterators
it1, it2 = tee(lst)
# move second to the second tuple
next(it2)
for ele in it1:
# yield original
yield ele
# yield first ele from next and first from current
yield (next(it2)[0], ele[1])

You can use a list comprehension and itertools.chain():
>>> list(chain.from_iterable([((i, j), (x, j)) for (i, j), (x, y) in zip(oldList, oldList[1:])])) + oldList[-1:]
[(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]

Not being a big fan of one-liners (or complexity) myself, I will propose a very explicit and readable (which is usually a good thing!) solution to your problem.
So, in a very simplistic approach, you could do this:
def insertElements(oldList):
"""
Return a new list, alternating oldList tuples with
new tuples in the form (oldList[i+1][0],oldList[i][1])
"""
newList = []
for i in range(len(oldList)-1):
# take one tuple as is
newList.append(oldList[i])
# then add a new one with items from current and next tuple
newList.append((oldList[i+1][0],oldList[i][1]))
else:
# don't forget the last tuple
newList.append(oldList[-1])
return newList
oldList = [(3, 10), (4, 7), (5, 5)]
newList = insertElements(oldList)
That will give you the desired result in newList:
print(newList)
[(3, 10), (4, 10), (4, 7), (5, 7), (5, 5)]
This is not much longer code than other more sophisticated (and memory efficient!) solutions, like using generators, AND I consider it a lot easier to read than intricate one-liners. Also, it would be easy to add some checks to this simple function (like making sure you have a list of tuples).
Unless you already know you need to optimize this particular piece of your code (assuming this is part of a bigger project), this should be good enough. At the same time it is: easy to implement, easy to read, easy to explain, easy to maintain, easy to extend, easy to refactor, etc.
Note: all other previous answers to your question are also better solutions than this simple one, in many ways. Just wanted to give you another choice. Hope this helps.

Pairwise circular Python 'for' loop

Is there a nice Pythonic way to loop over a list, retuning a pair of elements? The last element should be paired with the first.
So for instance, if I have the list [1, 2, 3], I would like to get the following pairs:
1 - 2
2 - 3
3 - 1

A Pythonic way to access a list pairwise is: zip(L, L[1:]). To connect the last item to the first one:
>>> L = [1, 2, 3]
>>> zip(L, L[1:] + L[:1])
[(1, 2), (2, 3), (3, 1)]

I would use a deque with zip to achieve this.
>>> from collections import deque
>>>
>>> l = [1,2,3]
>>> d = deque(l)
>>> d.rotate(-1)
>>> zip(l, d)
[(1, 2), (2, 3), (3, 1)]

I'd use a slight modification to the pairwise recipe from the itertools documentation:
def pairwise_circle(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ... (s<last>,s0)"
a, b = itertools.tee(iterable)
first_value = next(b, None)
return itertools.zip_longest(a, b,fillvalue=first_value)
This will simply keep a reference to the first value and when the second iterator is exhausted, zip_longest will fill the last place with the first value.
(Also note that it works with iterators like generators as well as iterables like lists/tuples.)
Note that #Barry's solution is very similar to this but a bit easier to understand in my opinion and easier to extend beyond one element.

I would pair itertools.cycle with zip:
import itertools
def circular_pairwise(l):
second = itertools.cycle(l)
next(second)
return zip(l, second)
cycle returns an iterable that yields the values of its argument in order, looping from the last value to the first.
We skip the first value, so it starts at position 1 (rather than 0).
Next, we zip it with the original, unmutated list. zip is good, because it stops when any of its argument iterables are exhausted.
Doing it this way avoids the creation of any intermediate lists: cycle holds a reference to the original, but doesn't copy it. zip operates in the same way.
It's important to note that this will break if the input is an iterator, such as a file, (or a map or zip in python-3), as advancing in one place (through next(second)) will automatically advance the iterator in all the others. This is easily solved using itertools.tee, which produces two independently operating iterators over the original iterable:
def circular_pairwise(it):
first, snd = itertools.tee(it)
second = itertools.cycle(snd)
next(second)
return zip(first, second)
tee can use large amounts of additional storage, for example, if one of the returned iterators is used up before the other is touched, but as we only ever have one step difference, the additional storage is minimal.

There are more efficient ways (that don't built temporary lists), but I think this is the most concise:
> l = [1,2,3]
> zip(l, (l+l)[1:])
[(1, 2), (2, 3), (3, 1)]

Pairwise circular Python 'for' loop
If you like the accepted answer,
zip(L, L[1:] + L[:1])
you can go much more memory light with semantically the same code using itertools:
from itertools import islice, chain #, izip as zip # uncomment if Python 2
And this barely materializes anything in memory beyond the original list (assuming the list is relatively large):
zip(l, chain(islice(l, 1, None), islice(l, None, 1)))
To use, just consume (for example, with a list):
>>> list(zip(l, chain(islice(l, 1, None), islice(l, None, 1))))
[(1, 2), (2, 3), (3, 1)]
This can be made extensible to any width:
def cyclical_window(l, width=2):
return zip(*[chain(islice(l, i, None), islice(l, None, i)) for i in range(width)])
and usage:
>>> l = [1, 2, 3, 4, 5]
>>> cyclical_window(l)
<itertools.izip object at 0x112E7D28>
>>> list(cyclical_window(l))
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 1)]
>>> list(cyclical_window(l, 4))
[(1, 2, 3, 4), (2, 3, 4, 5), (3, 4, 5, 1), (4, 5, 1, 2), (5, 1, 2, 3)]
Unlimited generation with itertools.tee with cycle
You can also use tee to avoid making a redundant cycle object:
from itertools import cycle, tee
ic1, ic2 = tee(cycle(l))
next(ic2) # must still queue up the next item
and now:
>>> [(next(ic1), next(ic2)) for _ in range(10)]
[(1, 2), (2, 3), (3, 1), (1, 2), (2, 3), (3, 1), (1, 2), (2, 3), (3, 1), (1, 2)]
This is incredibly efficient, an expected usage of iter with next, and elegant usage of cycle, tee, and zip.
Don't pass cycle directly to list unless you have saved your work and have time for your computer to creep to a halt as you max out its memory - if you're lucky, after a while your OS will kill the process before it crashes your computer.
Pure Python Builtin Functions
Finally, no standard lib imports, but this only works for up to the length of original list (IndexError otherwise.)
>>> [(l[i], l[i - len(l) + 1]) for i in range(len(l))]
[(1, 2), (2, 3), (3, 1)]
You can continue this with modulo:
>>> len_l = len(l)
>>> [(l[i % len_l], l[(i + 1) % len_l]) for i in range(10)]
[(1, 2), (2, 3), (3, 1), (1, 2), (2, 3), (3, 1), (1, 2), (2, 3), (3, 1), (1, 2)]

I would use a list comprehension, and take advantage of the fact that l[-1] is the last element.
>>> l = [1,2,3]
>>> [(l[i-1],l[i]) for i in range(len(l))]
[(3, 1), (1, 2), (2, 3)]
You don't need a temporary list that way.

Amazing how many different ways there are to solve this problem.
Here's one more. You can use the pairwise recipe but instead of zipping with b, chain it with the first element that you already popped off. Don't need to cycle when we just need a single extra value:
from itertools import chain, izip, tee
def pairwise_circle(iterable):
a, b = tee(iterable)
first = next(b, None)
return izip(a, chain(b, (first,)))

I like a solution that does not modify the original list and does not copy the list to temporary storage:
def circular(a_list):
for index in range(len(a_list) - 1):
yield a_list[index], a_list[index + 1]
yield a_list[-1], a_list[0]
for x in circular([1, 2, 3]):
print x
Output:
(1, 2)
(2, 3)
(3, 1)
I can imagine this being used on some very large in-memory data.

This one will work even if the list l has consumed most of the system's memory. (If something guarantees this case to be impossible, then zip as posted by chepner is fine)
l.append( l[0] )
for i in range( len(l)-1):
pair = l[i],l[i+1]
# stuff involving pair
del l[-1]
or more generalizably (works for any offset n i.e. l[ (i+n)%len(l) ] )
for i in range( len(l)):
pair = l[i], l[ (i+1)%len(l) ]
# stuff
provided you are on a system with decently fast modulo division (i.e. not some pea-brained embedded system).
There seems to be a often-held belief that indexing a list with an integer subscript is un-pythonic and best avoided. Why?

This is my solution, and it looks Pythonic enough to me:
l = [1,2,3]
for n,v in enumerate(l):
try:
print(v,l[n+1])
except IndexError:
print(v,l[0])
prints:
1 2
2 3
3 1
The generator function version:
def f(iterable):
for n,v in enumerate(iterable):
try:
yield(v,iterable[n+1])
except IndexError:
yield(v,iterable[0])
>>> list(f([1,2,3]))
[(1, 2), (2, 3), (3, 1)]

How about this?
li = li+[li[0]]
pairwise = [(li[i],li[i+1]) for i in range(len(li)-1)]

from itertools import izip, chain, islice
itr = izip(l, chain(islice(l, 1, None), islice(l, 1)))
(As above with #j-f-sebastian's "zip" answer, but using itertools.)
NB: EDITED given helpful nudge from #200_success. previously was:
itr = izip(l, chain(l[1:], l[:1]))

If you don't want to consume too much memory, you can try my solution:
[(l[i], l[(i+1) % len(l)]) for i, v in enumerate(l)]
It's a little slower, but consume less memory.

Starting in Python 3.10, the new pairwise function provides a way to create sliding pairs of consecutive elements:
from itertools import pairwise
# l = [1, 2, 3]
list(pairwise(l + l[:1]))
# [(1, 2), (2, 3), (3, 1)]
or simply pairwise(l + l[:1]) if you don't need the result as a list.
Note that we pairwise on the list appended with its head (l + l[:1]) so that rolling pairs are circular (i.e. so that we also include the (3, 1) pair):
list(pairwise(l)) # [(1, 2), (2, 3)]
l + l[:1] # [1, 2, 3, 1]

Just another try
>>> L = [1,2,3]
>>> zip(L,L[1:]) + [(L[-1],L[0])]
[(1, 2), (2, 3), (3, 1)]

L = [1, 2, 3]
a = zip(L, L[1:]+L[:1])
for i in a:
b = list(i)
print b

this seems like combinations would do the job.
from itertools import combinations
x=combinations([1,2,3],2)
this would yield a generator. this can then be iterated over as such
for i in x:
print i
the results would look something like
(1, 2)
(1, 3)
(2, 3)

Unexpected behavor zipping an iterator with a sequence

While trying to solve a particular code golf question, I came across a particular scenario, which I was having difficulty in understanding the behavior.
The scenario was, ziping an iterator with a sequence, and after the transpose operation, the iterator was one past the expected element.
>>> l = range(10)
>>> it = iter(l)
>>> zip(it, range(5))
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
>>> next(it) #expecting 5 here
6
Am I missing something obvious?
Note Please provide credible references for answers that may not be obvious

I suspect that, the 5 is consumed when zip tried to zip the next items. Zip stops when one of its arg is "empty":
>>> l = range(10)
>>> it = iter(l)
>>> zip(range(5),it)
[(0, 0), (1, 1), (2, 2), (3, 3), (4, 4)]
>>> it.next()
5
By reversing the order, zip knows that it can stop and do not consume the next item from it

If you want references you can check the izip documentation. It gives an equivalent implementation:
def izip(*iterables):
iterators = map(iter, iterables)
while iterators:
yield tuple(map(next, iterators))
Since list(izip(*args)) is expected to have same behavior as zip(*args), the result you got is actually the logical behavior.

Python Easiest Way to Sum List Intersection of List of Tuples

Let's say I have the following two lists of tuples
myList = [(1, 7), (3, 3), (5, 9)]
otherList = [(2, 4), (3, 5), (5, 2), (7, 8)]
returns => [(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
I would like to design a merge operation that merges these two lists by checking for any intersections on the first element of the tuple, if there are intersections, add the second elements of each tuple in question (merge the two). After the operation I would like to sort based upon the first element.
I am also posting this because I think its a pretty common problem that has an obvious solution, but I feel that there could be very pythonic solutions to this question ;)

Use a dictionary for the result:
result = {}
for k, v in my_list + other_list:
result[k] = result.get(k, 0) + v
If you want a list of tuples, you can get it via result.items(). The resulting list will be in arbitrary order, but of course you can sort it if desired.
(Note that I renamed your lists to conform with Python's style conventions.)

Use defaultdict:
from collections import defaultdict
results_dict = defaultdict(int)
results_dict.update(my_list)
for a, b in other_list:
results_dict[a] += b
results = sorted(results_dict.items())
Note: When sorting sequences, sorted sorts by the first item in the sequence. If the first elements are the same, then it compares the second element. You can give sorted a function to sort by, using the key keyword argument:
results = sorted(results_dict.items(), key=lambda x: x[1]) #sort by the 2nd item
or
results = sorted(results_dict.items(), key=lambda x: abs(x[0])) #sort by absolute value

A method using itertools:
>>> myList = [(1, 7), (3, 3), (5, 9)]
>>> otherList = [(2, 4), (3, 5), (5, 2), (7, 8)]
>>> import itertools
>>> merged = []
>>> for k, g in itertools.groupby(sorted(myList + otherList), lambda e: e[0]):
... merged.append((k, sum(e[1] for e in g)))
...
>>> merged
[(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
This first concatenates the two lists together and sorts it. itertools.groupby returns the elements of the merged list, grouped by the first element of the tuple, so it just sums them up and places it into the merged list.

>>> [(k, sum(v for x,v in myList + otherList if k == x)) for k in dict(myList + otherList).keys()]
[(1, 7), (2, 4), (3, 8), (5, 11), (7, 8)]
>>>
tested for both Python2.7 and 3.2
dict(myList + otherList).keys() returns an iterable containing a set of the keys for the joined lists
sum(...) takes 'k' to loop again through the joined list and add up tuple items 'v' where k == x
... but the extra looping adds processing overhead. Using an explicit dictionary as proposed by Sven Marnach avoids it.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

itertools.groupby: iterate over groups pairwise - python

Related

Get list comprehension object

Insert element to list based on previous and next elements

Pairwise circular Python 'for' loop

Unexpected behavor zipping an iterator with a sequence

Python Easiest Way to Sum List Intersection of List of Tuples

Categories

Resources