Note: I know how I can do this of course in an explicit for loop but I am looking for a solution that is a bit more readable.
If possible, I'd like to solve this by using some of the built-in functionalities. Best case scenario is something like
result = [ *groupby logic* ]
Assuming the following list:
import numpy as np
np.random.seed(42)
N = 10
my_tuples = list(zip(np.random.choice(list('ABC'), size=N),
np.random.choice(range(100), size=N)))
where my_tuples is
[('C', 74),
('A', 74),
('C', 87),
('C', 99),
('A', 23),
('A', 2),
('C', 21),
('B', 52),
('C', 1),
('C', 87)]
How can I group the indices (integer value at index 1 of each tuple) by the labels A, B and C using groupby from itertools?
If I do something like this:
from itertools import groupby
#..
[(k,*v) for k, v in dict(groupby(my_tuples, lambda x: x[0])).items()]
I see that this delivers the wrong result.
The desired outcome should be
{
'A': [74, 23, 2],
# ..
}
The simplest solution is probably not to use groupby at all.
from collections import defaultdict
d = defaultdict(list)
for k, v in my_tuples:
d[k].append(v)
The reason I wouldn't use groupby is because groupby(iterable) groups items in iterable that are adjacent. So to get all of the 'C' values together, you would first have to sort your list. Unless you have some reason to use groupby, it's unnecessary.
You should use collections.defaultdict for an O(n) solution, see #PatrickHaugh's answer.
Using itertools.groupby requires sorting before grouping, incurring O(n log n) complexity:
from itertools import groupby
from operator import itemgetter
sorter = sorted(my_tuples, key=itemgetter(0))
grouper = groupby(sorter, key=itemgetter(0))
res = {k: list(map(itemgetter(1), v)) for k, v in grouper}
print(res)
{'A': [74, 23, 2],
'B': [52],
'C': [74, 87, 99, 21, 1, 87]}
How to make Python's enumerate function to enumerate from bigger numbers to lesser (descending order, decrement, count down)? Or in general, how to use different step increment/decrement in enumerate?
For example, such function, applied to list ['a', 'b', 'c'], with start value 10 and step -2, would produce iterator [(10, 'a'), (8, 'b'), (6, 'c')].
I haven't found more elegant, idiomatic, and concise way, than to write a simple generator:
def enumerate2(xs, start=0, step=1):
for x in xs:
yield (start, x)
start += step
Examples:
>>> list(enumerate2([1,2,3], 5, -1))
[(5, 1), (4, 2), (3, 3)]
>>> list(enumerate2([1,2,3], 5, -2))
[(5, 1), (3, 2), (1, 3)]
If you don't understand the above code, read What does the "yield" keyword do in Python? and Difference between Python's Generators and Iterators.
One option is to zip your iterable to a range:
for index, item in zip(range(10, 0, -2), ['a', 'b', 'c']):
...
This does have the limitation that you need to know how far the range should go (the minimum it should cover - as in my example, excess will be truncated by zip).
If you don't know, you could roll your own "infinite range" (or just use itertools.count) and use that:
>>> def inf_range(start, step):
"""Generator function to provide a never-ending range."""
while True:
yield start
start += step
>>> list(zip(inf_range(10, -2), ['a', 'b', 'c']))
[(10, 'a'), (8, 'b'), (6, 'c')]
Here is an idiomatic way to do that:
list(zip(itertools.count(10,-2), 'abc'))
returns:
[(10, 'a'), (8, 'b'), (6, 'c')]
Another option is to use itertools.count, which is helpful for "enumerating" by a step, in reverse.
import itertools
counter = itertools.count(10, -2)
[(next(counter), letter) for letter in ["a", "b", "c"]]
# [(10, 'a'), (8, 'b'), (6, 'c')]
Characteristics
concise
the step and direction logic is compactly stored in count()
enumerated indices are iterated with next()
count() is inherently infinite; useful if the terminal boundary is unknown
(see #jonrsharpe)
the sequence length intrinsically terminates the infinite iterator
If you don't need iterator keeped in variable and just iterate through some container, multiply your index by step.
container = ['a', 'b', 'c']
step = -2
for index, value in enumerate(container):
print(f'{index * step}, {value}')
>>> 0, a
-2, b
-4, c
May be not very elegant, using f'strings the following quick solution can be handy
my_list = ['apple', 'banana', 'grapes', 'pear']
p=10
for counter, value in enumerate(my_list):
print(f" {counter+p}, {value}")
p+=9
> 10, apple
> 20, banana
> 30, grapes
> 40, pear
I have a list of 2-item tuples and I'd like to convert them to 2 lists where the first contains the first item in each tuple and the second list holds the second item.
For example:
original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# and I want to become...
result = (['a', 'b', 'c', 'd'], [1, 2, 3, 4])
Is there a builtin function that does that?
In 2.x, zip is its own inverse! Provided you use the special * operator.
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
This is equivalent to calling zip with each element of the list as a separate argument:
zip(('a', 1), ('b', 2), ('c', 3), ('d', 4))
except the arguments are passed to zip directly (after being converted to a tuple), so there's no need to worry about the number of arguments getting too big.
In 3.x, zip returns a lazy iterator, but this is trivially converted:
>>> list(zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)]))
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
You could also do
result = ([ a for a,b in original ], [ b for a,b in original ])
It should scale better. Especially if Python makes good on not expanding the list comprehensions unless needed.
(Incidentally, it makes a 2-tuple (pair) of lists, rather than a list of tuples, like zip does.)
If generators instead of actual lists are ok, this would do that:
result = (( a for a,b in original ), ( b for a,b in original ))
The generators don't munch through the list until you ask for each element, but on the other hand, they do keep references to the original list.
I like to use zip(*iterable) (which is the piece of code you're looking for) in my programs as so:
def unzip(iterable):
return zip(*iterable)
I find unzip more readable.
If you have lists that are not the same length, you may not want to use zip as per Patricks answer. This works:
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
But with different length lists, zip truncates each item to the length of the shortest list:
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e')]
You can use map with no function to fill empty results with None:
>>> map(None, *[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e'), (1, 2, 3, 4, None)]
zip() is marginally faster though.
To get a tuple of lists, as in the question:
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple([list(tup) for tup in zip(*original)])
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])
To unpack the two lists into separate variables:
list1, list2 = [list(tup) for tup in zip(*original)]
Naive approach
def transpose_finite_iterable(iterable):
return zip(*iterable) # `itertools.izip` for Python 2 users
works fine for finite iterable (e.g. sequences like list/tuple/str) of (potentially infinite) iterables which can be illustrated like
| |a_00| |a_10| ... |a_n0| |
| |a_01| |a_11| ... |a_n1| |
| |... | |... | ... |... | |
| |a_0i| |a_1i| ... |a_ni| |
| |... | |... | ... |... | |
where
n in ℕ,
a_ij corresponds to j-th element of i-th iterable,
and after applying transpose_finite_iterable we get
| |a_00| |a_01| ... |a_0i| ... |
| |a_10| |a_11| ... |a_1i| ... |
| |... | |... | ... |... | ... |
| |a_n0| |a_n1| ... |a_ni| ... |
Python example of such case where a_ij == j, n == 2
>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterable(iterable)
>>> next(result)
(0, 0)
>>> next(result)
(1, 1)
But we can't use transpose_finite_iterable again to return to structure of original iterable because result is an infinite iterable of finite iterables (tuples in our case):
>>> transpose_finite_iterable(result)
... hangs ...
Traceback (most recent call last):
File "...", line 1, in ...
File "...", line 2, in transpose_finite_iterable
MemoryError
So how can we deal with this case?
... and here comes the deque
After we take a look at docs of itertools.tee function, there is Python recipe that with some modification can help in our case
def transpose_finite_iterables(iterable):
iterator = iter(iterable)
try:
first_elements = next(iterator)
except StopIteration:
return ()
queues = [deque([element])
for element in first_elements]
def coordinate(queue):
while True:
if not queue:
try:
elements = next(iterator)
except StopIteration:
return
for sub_queue, element in zip(queues, elements):
sub_queue.append(element)
yield queue.popleft()
return tuple(map(coordinate, queues))
let's check
>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterables(transpose_finite_iterable(iterable))
>>> result
(<generator object transpose_finite_iterables.<locals>.coordinate at ...>, <generator object transpose_finite_iterables.<locals>.coordinate at ...>)
>>> next(result[0])
0
>>> next(result[0])
1
Synthesis
Now we can define general function for working with iterables of iterables ones of which are finite and another ones are potentially infinite using functools.singledispatch decorator like
from collections import (abc,
deque)
from functools import singledispatch
#singledispatch
def transpose(object_):
"""
Transposes given object.
"""
raise TypeError('Unsupported object type: {type}.'
.format(type=type))
#transpose.register(abc.Iterable)
def transpose_finite_iterables(object_):
"""
Transposes given iterable of finite iterables.
"""
iterator = iter(object_)
try:
first_elements = next(iterator)
except StopIteration:
return ()
queues = [deque([element])
for element in first_elements]
def coordinate(queue):
while True:
if not queue:
try:
elements = next(iterator)
except StopIteration:
return
for sub_queue, element in zip(queues, elements):
sub_queue.append(element)
yield queue.popleft()
return tuple(map(coordinate, queues))
def transpose_finite_iterable(object_):
"""
Transposes given finite iterable of iterables.
"""
yield from zip(*object_)
try:
transpose.register(abc.Collection, transpose_finite_iterable)
except AttributeError:
# Python3.5-
transpose.register(abc.Mapping, transpose_finite_iterable)
transpose.register(abc.Sequence, transpose_finite_iterable)
transpose.register(abc.Set, transpose_finite_iterable)
which can be considered as its own inverse (mathematicians call this kind of functions "involutions") in class of binary operators over finite non-empty iterables.
As a bonus of singledispatching we can handle numpy arrays like
import numpy as np
...
transpose.register(np.ndarray, np.transpose)
and then use it like
>>> array = np.arange(4).reshape((2,2))
>>> array
array([[0, 1],
[2, 3]])
>>> transpose(array)
array([[0, 2],
[1, 3]])
Note
Since transpose returns iterators and if someone wants to have a tuple of lists like in OP -- this can be made additionally with map built-in function like
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple(map(list, transpose(original)))
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])
Advertisement
I've added generalized solution to lz package from 0.5.0 version which can be used like
>>> from lz.transposition import transpose
>>> list(map(tuple, transpose(zip(range(10), range(10, 20)))))
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)]
P.S.
There is no solution (at least obvious) for handling potentially infinite iterable of potentially infinite iterables, but this case is less common though.
It's only another way to do it but it helped me a lot so I write it here:
Having this data structure:
X=[1,2,3,4]
Y=['a','b','c','d']
XY=zip(X,Y)
Resulting in:
In: XY
Out: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
The more pythonic way to unzip it and go back to the original is this one in my opinion:
x,y=zip(*XY)
But this return a tuple so if you need a list you can use:
x,y=(list(x),list(y))
Consider using more_itertools.unzip:
>>> from more_itertools import unzip
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> [list(x) for x in unzip(original)]
[['a', 'b', 'c', 'd'], [1, 2, 3, 4]]
None of the previous answers efficiently provide the required output, which is a tuple of lists, rather than a list of tuples. For the former, you can use tuple with map. Here's the difference:
res1 = list(zip(*original)) # [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
res2 = tuple(map(list, zip(*original))) # (['a', 'b', 'c', 'd'], [1, 2, 3, 4])
In addition, most of the previous solutions assume Python 2.7, where zip returns a list rather than an iterator.
For Python 3.x, you will need to pass the result to a function such as list or tuple to exhaust the iterator. For memory-efficient iterators, you can omit the outer list and tuple calls for the respective solutions.
While numpy arrays and pandas may be preferrable, this function imitates the behavior of zip(*args) when called as unzip(args).
Allows for generators, like the result from zip in Python 3, to be passed as args as it iterates through values.
def unzip(items, cls=list, ocls=tuple):
"""Zip function in reverse.
:param items: Zipped-like iterable.
:type items: iterable
:param cls: Container factory. Callable that returns iterable containers,
with a callable append attribute, to store the unzipped items. Defaults
to ``list``.
:type cls: callable, optional
:param ocls: Outer container factory. Callable that returns iterable
containers. with a callable append attribute, to store the inner
containers (see ``cls``). Defaults to ``tuple``.
:type ocls: callable, optional
:returns: Unzipped items in instances returned from ``cls``, in an instance
returned from ``ocls``.
"""
# iter() will return the same iterator passed to it whenever possible.
items = iter(items)
try:
i = next(items)
except StopIteration:
return ocls()
unzipped = ocls(cls([v]) for v in i)
for i in items:
for c, v in zip(unzipped, i):
c.append(v)
return unzipped
To use list cointainers, simply run unzip(zipped), as
unzip(zip(["a","b","c"],[1,2,3])) == (["a","b","c"],[1,2,3])
To use deques, or other any container sporting append, pass a factory function.
from collections import deque
unzip([("a",1),("b",2)], deque, list) == [deque(["a","b"]),deque([1,2])]
(Decorate cls and/or main_cls to micro manage container initialization, as briefly shown in the final assert statement above.)
Since it returns tuples (and can use tons of memory), the zip(*zipped) trick seems more clever than useful, to me.
Here's a function that will actually give you the inverse of zip.
def unzip(zipped):
"""Inverse of built-in zip function.
Args:
zipped: a list of tuples
Returns:
a tuple of lists
Example:
a = [1, 2, 3]
b = [4, 5, 6]
zipped = list(zip(a, b))
assert zipped == [(1, 4), (2, 5), (3, 6)]
unzipped = unzip(zipped)
assert unzipped == ([1, 2, 3], [4, 5, 6])
"""
unzipped = ()
if len(zipped) == 0:
return unzipped
dim = len(zipped[0])
for i in range(dim):
unzipped = unzipped + ([tup[i] for tup in zipped], )
return unzipped
While zip(*seq) is very useful, it may be unsuitable for very long sequences as it will create a tuple of values to be passed in. For example, I've been working with a coordinate system with over a million entries and find it signifcantly faster to create the sequences directly.
A generic approach would be something like this:
from collections import deque
seq = ((a1, b1, …), (a2, b2, …), …)
width = len(seq[0])
output = [deque(len(seq))] * width # preallocate memory
for element in seq:
for s, item in zip(output, element):
s.append(item)
But, depending on what you want to do with the result, the choice of collection can make a big difference. In my actual use case, using sets and no internal loop, is noticeably faster than all other approaches.
And, as others have noted, if you are doing this with datasets, it might make sense to use Numpy or Pandas collections instead.
Just to summarize:
# data
a = ('a', 'b', 'c', 'd')
b = (1, 2, 3, 4)
# forward
zipped = zip(a, b) # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# reverse
a_, b_ = zip(*zipped)
# verify
assert a == a_
assert b == b_
Here's a simple one-line answer that produces the desired output:
original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
list(zip(*original))
# [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]