zip list with a single element - python

I have a list of some elements, e.g. [1, 2, 3, 4] and a single object, e.g. 'a'. I want to produce a list of tuples with the elements of the list in the first position and the single object in the second position: [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')].
I could do it with zip like this:
def zip_with_scalar(l, o): # l - the list; o - the object
return list(zip(l, [o] * len(l)))
However, this gives me a feeling of creating and unnecessary list of repeating element.
Another possibility is
def zip_with_scalar(l, o):
return [(i, o) for i in l]
which is very clean and pythonic indeed, but here I do the whole thing "manually". In Haskell I would do something like
zipWithScalar l o = zip l $ repeat o
Is there any built-in function or trick, either for the zipping with scalar or for something that would enable me to use ordinary zip, i.e. sort-of infinite list?

This is the cloest to your Haskell solution:
import itertools
def zip_with_scalar(l, o):
return zip(l, itertools.repeat(o))
You could also use generators, which avoid creating a list like comprehensions do:
def zip_with_scalar(l, o):
return ((i, o) for i in l)

You can use the built-in map function:
>>> elements = [1, 2, 3, 4]
>>> key = 'a'
>>> map(lambda e: (e, key), elements)
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]

This is a perfect job for the itertools.cycle class.
from itertools import cycle
def zip_with_scalar(l, o):
return zip(i, cycle(o))
Demo:
>>> from itertools import cycle
>>> l = [1, 2, 3, 4]
>>> list(zip(l, cycle('a')))
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]

lst = [1,2,3,4]
tups = [(itm, 'a') for itm in lst]
tups
> [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]

>>> l = [1, 2, 3, 4]
>>> list(zip(l, "a"*len(l)))
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]

You could also use zip_longest with a fillvalue of o:
from itertools import zip_longest
def zip_with_scalar(l, o): # l - the list; o - the object
return zip_longest(l, [o], fillvalue=o)
print(list(zip_with_scalar([1, 2, 3, 4] ,"a")))
Just be aware that any mutable values used for o won't be copied whether using zip_longest or repeat.

The more-itertools library recently added a zip_broadcast() function that solves this problem well:
>>> from more_itertools import zip_broadcast
>>> list(zip_broadcast([1,2,3,4], 'a'))
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
This is a much more general solution than the other answers posted here:
Empty iterables are correctly handled.
There can be multiple iterable and/or scalar arguments.
The order of the scalar/iterable arguments doesn't need to be known.
If there are multiple iterable arguments, you can check that they are the same length with strict=True.
You can easily control whether or not strings should be treated as iterables (by default they are not).

Just define a class with infinite iterator which is initialized with the single element you want to injected in the lists:
class zipIterator:
def __init__(self, val):
self.__val = val
def __iter__(self):
return self
def __next__(self):
return self.__val
and then create your new list from this class and the lists you have:
elements = [1, 2, 3, 4]
key = 'a'
res = [it for it in zip(elements, zipIterator(key))]
the result would be:
>>res
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]

Related

why does itertools.count() consume an extra element when used with zip?

I was trying to use functools.partial with itertools.count, by currying zip with itertools.count():
g = functools.partial(zip, itertools.count())
When calling g with inputs like "abc", "ABC", I noticed that itertools.count() mysteriously "jumps".
I thought I should get the same result as directly using zip with itertools.count()? like:
>>> x=itertools.count();
>>> list(zip("abc",x))
[('a', 0), ('b', 1), ('c', 2)]
>>> list(zip("ABC",x))
[('A', 3), ('B', 4), ('C', 5)]
But instead, I get the following -- notice the starting index at the second call of g is 4 instead of 3:
>>> g = functools.partial(zip, itertools.count())
>>> list(g("abc"))
[(0, 'a'), (1, 'b'), (2, 'c')]
>>> list(g("ABC"))
[(4, 'A'), (5, 'B'), (6, 'C')]
Note that you'd get the same result if your original code used arguments in the same order as your altered code:
>>> x = itertools.count()
>>> list(zip(x, "abc"))
[(0, 'a'), (1, 'b'), (2, 'c')]
>>> list(zip(x, "ABC"))
[(4, 'A'), (5, 'B'), (6, 'C')]
zip() tries its first argument first, then its second, then its third ... and stops when one of them is exhausted.
In the spelling just above, after "abc" is exhausted, it goes back to the first argument and gets 3 from x. But its second argument is already exhausted, so zip() stops, and the 3 is silently lost.
Then it moves on to the second zip(), and starts by getting 4 from x.
partial() really has nothing to do with it.
It'll be easy to see why if you encapsulate itertools.count() inside a function:
def count():
c = itertools.count()
while True:
v = next(c)
print('yielding', v)
yield v
g = functools.partial(zip, count())
list(g("abc"))
The output is
yielding 0
yielding 1
yielding 2
yielding 3
[(0, 'a'), (1, 'b'), (2, 'c')]
You'll see zip will evaluate the next argument from count() (so an extra value 3 is yielded) before it realises there isn't anything else left in the second iterable.
As an exercise, reverse the arguments and you'll see the evaluation is a little different.

Pairwise circular Python 'for' loop

Is there a nice Pythonic way to loop over a list, retuning a pair of elements? The last element should be paired with the first.
So for instance, if I have the list [1, 2, 3], I would like to get the following pairs:
1 - 2
2 - 3
3 - 1
A Pythonic way to access a list pairwise is: zip(L, L[1:]). To connect the last item to the first one:
>>> L = [1, 2, 3]
>>> zip(L, L[1:] + L[:1])
[(1, 2), (2, 3), (3, 1)]
I would use a deque with zip to achieve this.
>>> from collections import deque
>>>
>>> l = [1,2,3]
>>> d = deque(l)
>>> d.rotate(-1)
>>> zip(l, d)
[(1, 2), (2, 3), (3, 1)]
I'd use a slight modification to the pairwise recipe from the itertools documentation:
def pairwise_circle(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ... (s<last>,s0)"
a, b = itertools.tee(iterable)
first_value = next(b, None)
return itertools.zip_longest(a, b,fillvalue=first_value)
This will simply keep a reference to the first value and when the second iterator is exhausted, zip_longest will fill the last place with the first value.
(Also note that it works with iterators like generators as well as iterables like lists/tuples.)
Note that #Barry's solution is very similar to this but a bit easier to understand in my opinion and easier to extend beyond one element.
I would pair itertools.cycle with zip:
import itertools
def circular_pairwise(l):
second = itertools.cycle(l)
next(second)
return zip(l, second)
cycle returns an iterable that yields the values of its argument in order, looping from the last value to the first.
We skip the first value, so it starts at position 1 (rather than 0).
Next, we zip it with the original, unmutated list. zip is good, because it stops when any of its argument iterables are exhausted.
Doing it this way avoids the creation of any intermediate lists: cycle holds a reference to the original, but doesn't copy it. zip operates in the same way.
It's important to note that this will break if the input is an iterator, such as a file, (or a map or zip in python-3), as advancing in one place (through next(second)) will automatically advance the iterator in all the others. This is easily solved using itertools.tee, which produces two independently operating iterators over the original iterable:
def circular_pairwise(it):
first, snd = itertools.tee(it)
second = itertools.cycle(snd)
next(second)
return zip(first, second)
tee can use large amounts of additional storage, for example, if one of the returned iterators is used up before the other is touched, but as we only ever have one step difference, the additional storage is minimal.
There are more efficient ways (that don't built temporary lists), but I think this is the most concise:
> l = [1,2,3]
> zip(l, (l+l)[1:])
[(1, 2), (2, 3), (3, 1)]
Pairwise circular Python 'for' loop
If you like the accepted answer,
zip(L, L[1:] + L[:1])
you can go much more memory light with semantically the same code using itertools:
from itertools import islice, chain #, izip as zip # uncomment if Python 2
And this barely materializes anything in memory beyond the original list (assuming the list is relatively large):
zip(l, chain(islice(l, 1, None), islice(l, None, 1)))
To use, just consume (for example, with a list):
>>> list(zip(l, chain(islice(l, 1, None), islice(l, None, 1))))
[(1, 2), (2, 3), (3, 1)]
This can be made extensible to any width:
def cyclical_window(l, width=2):
return zip(*[chain(islice(l, i, None), islice(l, None, i)) for i in range(width)])
and usage:
>>> l = [1, 2, 3, 4, 5]
>>> cyclical_window(l)
<itertools.izip object at 0x112E7D28>
>>> list(cyclical_window(l))
[(1, 2), (2, 3), (3, 4), (4, 5), (5, 1)]
>>> list(cyclical_window(l, 4))
[(1, 2, 3, 4), (2, 3, 4, 5), (3, 4, 5, 1), (4, 5, 1, 2), (5, 1, 2, 3)]
Unlimited generation with itertools.tee with cycle
You can also use tee to avoid making a redundant cycle object:
from itertools import cycle, tee
ic1, ic2 = tee(cycle(l))
next(ic2) # must still queue up the next item
and now:
>>> [(next(ic1), next(ic2)) for _ in range(10)]
[(1, 2), (2, 3), (3, 1), (1, 2), (2, 3), (3, 1), (1, 2), (2, 3), (3, 1), (1, 2)]
This is incredibly efficient, an expected usage of iter with next, and elegant usage of cycle, tee, and zip.
Don't pass cycle directly to list unless you have saved your work and have time for your computer to creep to a halt as you max out its memory - if you're lucky, after a while your OS will kill the process before it crashes your computer.
Pure Python Builtin Functions
Finally, no standard lib imports, but this only works for up to the length of original list (IndexError otherwise.)
>>> [(l[i], l[i - len(l) + 1]) for i in range(len(l))]
[(1, 2), (2, 3), (3, 1)]
You can continue this with modulo:
>>> len_l = len(l)
>>> [(l[i % len_l], l[(i + 1) % len_l]) for i in range(10)]
[(1, 2), (2, 3), (3, 1), (1, 2), (2, 3), (3, 1), (1, 2), (2, 3), (3, 1), (1, 2)]
I would use a list comprehension, and take advantage of the fact that l[-1] is the last element.
>>> l = [1,2,3]
>>> [(l[i-1],l[i]) for i in range(len(l))]
[(3, 1), (1, 2), (2, 3)]
You don't need a temporary list that way.
Amazing how many different ways there are to solve this problem.
Here's one more. You can use the pairwise recipe but instead of zipping with b, chain it with the first element that you already popped off. Don't need to cycle when we just need a single extra value:
from itertools import chain, izip, tee
def pairwise_circle(iterable):
a, b = tee(iterable)
first = next(b, None)
return izip(a, chain(b, (first,)))
I like a solution that does not modify the original list and does not copy the list to temporary storage:
def circular(a_list):
for index in range(len(a_list) - 1):
yield a_list[index], a_list[index + 1]
yield a_list[-1], a_list[0]
for x in circular([1, 2, 3]):
print x
Output:
(1, 2)
(2, 3)
(3, 1)
I can imagine this being used on some very large in-memory data.
This one will work even if the list l has consumed most of the system's memory. (If something guarantees this case to be impossible, then zip as posted by chepner is fine)
l.append( l[0] )
for i in range( len(l)-1):
pair = l[i],l[i+1]
# stuff involving pair
del l[-1]
or more generalizably (works for any offset n i.e. l[ (i+n)%len(l) ] )
for i in range( len(l)):
pair = l[i], l[ (i+1)%len(l) ]
# stuff
provided you are on a system with decently fast modulo division (i.e. not some pea-brained embedded system).
There seems to be a often-held belief that indexing a list with an integer subscript is un-pythonic and best avoided. Why?
This is my solution, and it looks Pythonic enough to me:
l = [1,2,3]
for n,v in enumerate(l):
try:
print(v,l[n+1])
except IndexError:
print(v,l[0])
prints:
1 2
2 3
3 1
The generator function version:
def f(iterable):
for n,v in enumerate(iterable):
try:
yield(v,iterable[n+1])
except IndexError:
yield(v,iterable[0])
>>> list(f([1,2,3]))
[(1, 2), (2, 3), (3, 1)]
How about this?
li = li+[li[0]]
pairwise = [(li[i],li[i+1]) for i in range(len(li)-1)]
from itertools import izip, chain, islice
itr = izip(l, chain(islice(l, 1, None), islice(l, 1)))
(As above with #j-f-sebastian's "zip" answer, but using itertools.)
NB: EDITED given helpful nudge from #200_success. previously was:
itr = izip(l, chain(l[1:], l[:1]))
If you don't want to consume too much memory, you can try my solution:
[(l[i], l[(i+1) % len(l)]) for i, v in enumerate(l)]
It's a little slower, but consume less memory.
Starting in Python 3.10, the new pairwise function provides a way to create sliding pairs of consecutive elements:
from itertools import pairwise
# l = [1, 2, 3]
list(pairwise(l + l[:1]))
# [(1, 2), (2, 3), (3, 1)]
or simply pairwise(l + l[:1]) if you don't need the result as a list.
Note that we pairwise on the list appended with its head (l + l[:1]) so that rolling pairs are circular (i.e. so that we also include the (3, 1) pair):
list(pairwise(l)) # [(1, 2), (2, 3)]
l + l[:1] # [1, 2, 3, 1]
Just another try
>>> L = [1,2,3]
>>> zip(L,L[1:]) + [(L[-1],L[0])]
[(1, 2), (2, 3), (3, 1)]
L = [1, 2, 3]
a = zip(L, L[1:]+L[:1])
for i in a:
b = list(i)
print b
this seems like combinations would do the job.
from itertools import combinations
x=combinations([1,2,3],2)
this would yield a generator. this can then be iterated over as such
for i in x:
print i
the results would look something like
(1, 2)
(1, 3)
(2, 3)

pass list of iterables to itertools function

i am using the itertools.product function. i have a 2-deep nested list, which is a list of iterables. i want to pass this to product function dont know how to format it correctly.
to be clear, i want
In [37]: [k for k in product([1,2],['a','b'])]
Out[37]: [(1, 'a'), (1, 'b'), (2, 'a'), (2, 'b')]
but generated from the a nested_list input like this
nested_list = [[1,2],['a','b']]
but instead i get
In [36]: [k for k in product(nested_list)]
Out[36]: [([1, 2],), (['a', 'b'],)]
product takes variable number of arguments, so you need to unpack your list.
list(product(*nested_list)) # without list() normally, of course

Is there a python builtin to create tuples from multiple lists?

Is there a python builtin that does the same as tupler for a set of lists, or something similar:
def tupler(arg1, *args):
length = min([len(arg1)]+[len(x) for x in args])
out = []
for i in range(length):
out.append(tuple([x[i] for x in [arg1]+args]))
return out
so, for example:
tupler([1,2,3,4],[5,6,7])
returns:
[(1,5),(2,6),(3,7)]
or perhaps there is proper pythony way of doing this, or is there a generator similar???
I think you're looking for zip():
>>> zip([1,2,3,4],[5,6,7])
[(1, 5), (2, 6), (3, 7)]
have a look at the built-in zip function http://docs.python.org/library/functions.html#zip
it can also handle more than two lists, say n, and then creates n-tuples.
>>> zip([1,2,3,4], [5,6,7,8], [9,10,11,12], [13,14])
[(1, 5, 9, 13), (2, 6, 10, 14)]
zip([1,2,3,4],[5,6,7])
--->[(1,5),(2,6),(3,7)]
args = [(1,5),(2,6),(3,7)]
zip(*args)
--->[1,2,3],[5,6,7]
The proper way is to use the zip function.
Alternativerly we can use list comprehensions and the built-in enumerate function
to achieve the same result.
>>> L1 = [1,2,3,4]
>>> L2 = [5,6,7]
>>> [(value, L2[i]) for i, value in enumerate(L1) if i < len(L2)]
[(1, 5), (2, 6), (3, 7)]
>>>
The drawback in the above example is that we don't always iterate over the list with the minimum length.

Transpose/Unzip Function (inverse of zip)?

I have a list of 2-item tuples and I'd like to convert them to 2 lists where the first contains the first item in each tuple and the second list holds the second item.
For example:
original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# and I want to become...
result = (['a', 'b', 'c', 'd'], [1, 2, 3, 4])
Is there a builtin function that does that?
In 2.x, zip is its own inverse! Provided you use the special * operator.
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
This is equivalent to calling zip with each element of the list as a separate argument:
zip(('a', 1), ('b', 2), ('c', 3), ('d', 4))
except the arguments are passed to zip directly (after being converted to a tuple), so there's no need to worry about the number of arguments getting too big.
In 3.x, zip returns a lazy iterator, but this is trivially converted:
>>> list(zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)]))
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
You could also do
result = ([ a for a,b in original ], [ b for a,b in original ])
It should scale better. Especially if Python makes good on not expanding the list comprehensions unless needed.
(Incidentally, it makes a 2-tuple (pair) of lists, rather than a list of tuples, like zip does.)
If generators instead of actual lists are ok, this would do that:
result = (( a for a,b in original ), ( b for a,b in original ))
The generators don't munch through the list until you ask for each element, but on the other hand, they do keep references to the original list.
I like to use zip(*iterable) (which is the piece of code you're looking for) in my programs as so:
def unzip(iterable):
return zip(*iterable)
I find unzip more readable.
If you have lists that are not the same length, you may not want to use zip as per Patricks answer. This works:
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
But with different length lists, zip truncates each item to the length of the shortest list:
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e')]
You can use map with no function to fill empty results with None:
>>> map(None, *[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e'), (1, 2, 3, 4, None)]
zip() is marginally faster though.
To get a tuple of lists, as in the question:
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple([list(tup) for tup in zip(*original)])
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])
To unpack the two lists into separate variables:
list1, list2 = [list(tup) for tup in zip(*original)]
Naive approach
def transpose_finite_iterable(iterable):
return zip(*iterable) # `itertools.izip` for Python 2 users
works fine for finite iterable (e.g. sequences like list/tuple/str) of (potentially infinite) iterables which can be illustrated like
| |a_00| |a_10| ... |a_n0| |
| |a_01| |a_11| ... |a_n1| |
| |... | |... | ... |... | |
| |a_0i| |a_1i| ... |a_ni| |
| |... | |... | ... |... | |
where
n in ℕ,
a_ij corresponds to j-th element of i-th iterable,
and after applying transpose_finite_iterable we get
| |a_00| |a_01| ... |a_0i| ... |
| |a_10| |a_11| ... |a_1i| ... |
| |... | |... | ... |... | ... |
| |a_n0| |a_n1| ... |a_ni| ... |
Python example of such case where a_ij == j, n == 2
>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterable(iterable)
>>> next(result)
(0, 0)
>>> next(result)
(1, 1)
But we can't use transpose_finite_iterable again to return to structure of original iterable because result is an infinite iterable of finite iterables (tuples in our case):
>>> transpose_finite_iterable(result)
... hangs ...
Traceback (most recent call last):
File "...", line 1, in ...
File "...", line 2, in transpose_finite_iterable
MemoryError
So how can we deal with this case?
... and here comes the deque
After we take a look at docs of itertools.tee function, there is Python recipe that with some modification can help in our case
def transpose_finite_iterables(iterable):
iterator = iter(iterable)
try:
first_elements = next(iterator)
except StopIteration:
return ()
queues = [deque([element])
for element in first_elements]
def coordinate(queue):
while True:
if not queue:
try:
elements = next(iterator)
except StopIteration:
return
for sub_queue, element in zip(queues, elements):
sub_queue.append(element)
yield queue.popleft()
return tuple(map(coordinate, queues))
let's check
>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterables(transpose_finite_iterable(iterable))
>>> result
(<generator object transpose_finite_iterables.<locals>.coordinate at ...>, <generator object transpose_finite_iterables.<locals>.coordinate at ...>)
>>> next(result[0])
0
>>> next(result[0])
1
Synthesis
Now we can define general function for working with iterables of iterables ones of which are finite and another ones are potentially infinite using functools.singledispatch decorator like
from collections import (abc,
deque)
from functools import singledispatch
#singledispatch
def transpose(object_):
"""
Transposes given object.
"""
raise TypeError('Unsupported object type: {type}.'
.format(type=type))
#transpose.register(abc.Iterable)
def transpose_finite_iterables(object_):
"""
Transposes given iterable of finite iterables.
"""
iterator = iter(object_)
try:
first_elements = next(iterator)
except StopIteration:
return ()
queues = [deque([element])
for element in first_elements]
def coordinate(queue):
while True:
if not queue:
try:
elements = next(iterator)
except StopIteration:
return
for sub_queue, element in zip(queues, elements):
sub_queue.append(element)
yield queue.popleft()
return tuple(map(coordinate, queues))
def transpose_finite_iterable(object_):
"""
Transposes given finite iterable of iterables.
"""
yield from zip(*object_)
try:
transpose.register(abc.Collection, transpose_finite_iterable)
except AttributeError:
# Python3.5-
transpose.register(abc.Mapping, transpose_finite_iterable)
transpose.register(abc.Sequence, transpose_finite_iterable)
transpose.register(abc.Set, transpose_finite_iterable)
which can be considered as its own inverse (mathematicians call this kind of functions "involutions") in class of binary operators over finite non-empty iterables.
As a bonus of singledispatching we can handle numpy arrays like
import numpy as np
...
transpose.register(np.ndarray, np.transpose)
and then use it like
>>> array = np.arange(4).reshape((2,2))
>>> array
array([[0, 1],
[2, 3]])
>>> transpose(array)
array([[0, 2],
[1, 3]])
Note
Since transpose returns iterators and if someone wants to have a tuple of lists like in OP -- this can be made additionally with map built-in function like
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple(map(list, transpose(original)))
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])
Advertisement
I've added generalized solution to lz package from 0.5.0 version which can be used like
>>> from lz.transposition import transpose
>>> list(map(tuple, transpose(zip(range(10), range(10, 20)))))
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)]
P.S.
There is no solution (at least obvious) for handling potentially infinite iterable of potentially infinite iterables, but this case is less common though.
It's only another way to do it but it helped me a lot so I write it here:
Having this data structure:
X=[1,2,3,4]
Y=['a','b','c','d']
XY=zip(X,Y)
Resulting in:
In: XY
Out: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
The more pythonic way to unzip it and go back to the original is this one in my opinion:
x,y=zip(*XY)
But this return a tuple so if you need a list you can use:
x,y=(list(x),list(y))
Consider using more_itertools.unzip:
>>> from more_itertools import unzip
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> [list(x) for x in unzip(original)]
[['a', 'b', 'c', 'd'], [1, 2, 3, 4]]
None of the previous answers efficiently provide the required output, which is a tuple of lists, rather than a list of tuples. For the former, you can use tuple with map. Here's the difference:
res1 = list(zip(*original)) # [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
res2 = tuple(map(list, zip(*original))) # (['a', 'b', 'c', 'd'], [1, 2, 3, 4])
In addition, most of the previous solutions assume Python 2.7, where zip returns a list rather than an iterator.
For Python 3.x, you will need to pass the result to a function such as list or tuple to exhaust the iterator. For memory-efficient iterators, you can omit the outer list and tuple calls for the respective solutions.
While numpy arrays and pandas may be preferrable, this function imitates the behavior of zip(*args) when called as unzip(args).
Allows for generators, like the result from zip in Python 3, to be passed as args as it iterates through values.
def unzip(items, cls=list, ocls=tuple):
"""Zip function in reverse.
:param items: Zipped-like iterable.
:type items: iterable
:param cls: Container factory. Callable that returns iterable containers,
with a callable append attribute, to store the unzipped items. Defaults
to ``list``.
:type cls: callable, optional
:param ocls: Outer container factory. Callable that returns iterable
containers. with a callable append attribute, to store the inner
containers (see ``cls``). Defaults to ``tuple``.
:type ocls: callable, optional
:returns: Unzipped items in instances returned from ``cls``, in an instance
returned from ``ocls``.
"""
# iter() will return the same iterator passed to it whenever possible.
items = iter(items)
try:
i = next(items)
except StopIteration:
return ocls()
unzipped = ocls(cls([v]) for v in i)
for i in items:
for c, v in zip(unzipped, i):
c.append(v)
return unzipped
To use list cointainers, simply run unzip(zipped), as
unzip(zip(["a","b","c"],[1,2,3])) == (["a","b","c"],[1,2,3])
To use deques, or other any container sporting append, pass a factory function.
from collections import deque
unzip([("a",1),("b",2)], deque, list) == [deque(["a","b"]),deque([1,2])]
(Decorate cls and/or main_cls to micro manage container initialization, as briefly shown in the final assert statement above.)
Since it returns tuples (and can use tons of memory), the zip(*zipped) trick seems more clever than useful, to me.
Here's a function that will actually give you the inverse of zip.
def unzip(zipped):
"""Inverse of built-in zip function.
Args:
zipped: a list of tuples
Returns:
a tuple of lists
Example:
a = [1, 2, 3]
b = [4, 5, 6]
zipped = list(zip(a, b))
assert zipped == [(1, 4), (2, 5), (3, 6)]
unzipped = unzip(zipped)
assert unzipped == ([1, 2, 3], [4, 5, 6])
"""
unzipped = ()
if len(zipped) == 0:
return unzipped
dim = len(zipped[0])
for i in range(dim):
unzipped = unzipped + ([tup[i] for tup in zipped], )
return unzipped
While zip(*seq) is very useful, it may be unsuitable for very long sequences as it will create a tuple of values to be passed in. For example, I've been working with a coordinate system with over a million entries and find it signifcantly faster to create the sequences directly.
A generic approach would be something like this:
from collections import deque
seq = ((a1, b1, …), (a2, b2, …), …)
width = len(seq[0])
output = [deque(len(seq))] * width # preallocate memory
for element in seq:
for s, item in zip(output, element):
s.append(item)
But, depending on what you want to do with the result, the choice of collection can make a big difference. In my actual use case, using sets and no internal loop, is noticeably faster than all other approaches.
And, as others have noted, if you are doing this with datasets, it might make sense to use Numpy or Pandas collections instead.
Just to summarize:
# data
a = ('a', 'b', 'c', 'd')
b = (1, 2, 3, 4)
# forward
zipped = zip(a, b) # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# reverse
a_, b_ = zip(*zipped)
# verify
assert a == a_
assert b == b_
Here's a simple one-line answer that produces the desired output:
original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
list(zip(*original))
# [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]

Categories

Resources