Python enumerate downwards or with a custom step - python

How to make Python's enumerate function to enumerate from bigger numbers to lesser (descending order, decrement, count down)? Or in general, how to use different step increment/decrement in enumerate?
For example, such function, applied to list ['a', 'b', 'c'], with start value 10 and step -2, would produce iterator [(10, 'a'), (8, 'b'), (6, 'c')].

I haven't found more elegant, idiomatic, and concise way, than to write a simple generator:
def enumerate2(xs, start=0, step=1):
for x in xs:
yield (start, x)
start += step
Examples:
>>> list(enumerate2([1,2,3], 5, -1))
[(5, 1), (4, 2), (3, 3)]
>>> list(enumerate2([1,2,3], 5, -2))
[(5, 1), (3, 2), (1, 3)]
If you don't understand the above code, read What does the "yield" keyword do in Python? and Difference between Python's Generators and Iterators.

One option is to zip your iterable to a range:
for index, item in zip(range(10, 0, -2), ['a', 'b', 'c']):
...
This does have the limitation that you need to know how far the range should go (the minimum it should cover - as in my example, excess will be truncated by zip).
If you don't know, you could roll your own "infinite range" (or just use itertools.count) and use that:
>>> def inf_range(start, step):
"""Generator function to provide a never-ending range."""
while True:
yield start
start += step
>>> list(zip(inf_range(10, -2), ['a', 'b', 'c']))
[(10, 'a'), (8, 'b'), (6, 'c')]

Here is an idiomatic way to do that:
list(zip(itertools.count(10,-2), 'abc'))
returns:
[(10, 'a'), (8, 'b'), (6, 'c')]

Another option is to use itertools.count, which is helpful for "enumerating" by a step, in reverse.
import itertools
counter = itertools.count(10, -2)
[(next(counter), letter) for letter in ["a", "b", "c"]]
# [(10, 'a'), (8, 'b'), (6, 'c')]
Characteristics
concise
the step and direction logic is compactly stored in count()
enumerated indices are iterated with next()
count() is inherently infinite; useful if the terminal boundary is unknown
(see #jonrsharpe)
the sequence length intrinsically terminates the infinite iterator

If you don't need iterator keeped in variable and just iterate through some container, multiply your index by step.
container = ['a', 'b', 'c']
step = -2
for index, value in enumerate(container):
print(f'{index * step}, {value}')
>>> 0, a
-2, b
-4, c

May be not very elegant, using f'strings the following quick solution can be handy
my_list = ['apple', 'banana', 'grapes', 'pear']
p=10
for counter, value in enumerate(my_list):
print(f" {counter+p}, {value}")
p+=9
> 10, apple
> 20, banana
> 30, grapes
> 40, pear

Related

why does itertools.count() consume an extra element when used with zip?

I was trying to use functools.partial with itertools.count, by currying zip with itertools.count():
g = functools.partial(zip, itertools.count())
When calling g with inputs like "abc", "ABC", I noticed that itertools.count() mysteriously "jumps".
I thought I should get the same result as directly using zip with itertools.count()? like:
>>> x=itertools.count();
>>> list(zip("abc",x))
[('a', 0), ('b', 1), ('c', 2)]
>>> list(zip("ABC",x))
[('A', 3), ('B', 4), ('C', 5)]
But instead, I get the following -- notice the starting index at the second call of g is 4 instead of 3:
>>> g = functools.partial(zip, itertools.count())
>>> list(g("abc"))
[(0, 'a'), (1, 'b'), (2, 'c')]
>>> list(g("ABC"))
[(4, 'A'), (5, 'B'), (6, 'C')]
Note that you'd get the same result if your original code used arguments in the same order as your altered code:
>>> x = itertools.count()
>>> list(zip(x, "abc"))
[(0, 'a'), (1, 'b'), (2, 'c')]
>>> list(zip(x, "ABC"))
[(4, 'A'), (5, 'B'), (6, 'C')]
zip() tries its first argument first, then its second, then its third ... and stops when one of them is exhausted.
In the spelling just above, after "abc" is exhausted, it goes back to the first argument and gets 3 from x. But its second argument is already exhausted, so zip() stops, and the 3 is silently lost.
Then it moves on to the second zip(), and starts by getting 4 from x.
partial() really has nothing to do with it.
It'll be easy to see why if you encapsulate itertools.count() inside a function:
def count():
c = itertools.count()
while True:
v = next(c)
print('yielding', v)
yield v
g = functools.partial(zip, count())
list(g("abc"))
The output is
yielding 0
yielding 1
yielding 2
yielding 3
[(0, 'a'), (1, 'b'), (2, 'c')]
You'll see zip will evaluate the next argument from count() (so an extra value 3 is yielded) before it realises there isn't anything else left in the second iterable.
As an exercise, reverse the arguments and you'll see the evaluation is a little different.

Python 3: Reverse consecutive runs in sorted list?

This is a question is an extension of What's the most Pythonic way to identify consecutive duplicates in a list?.
Suppose you have a list of tuples:
my_list = [(1,4), (2,3), (3,2), (4,4), (5,2)]
and you sort it by each tuple's last value:
my_list = sorted(my_list, key=lambda tuple: tuple[1])
# [(3,2), (5,2), (2,3), (1,4), (4,4)]
then we have two consecutive runs (looking at the last value in each tuple), namely [(3,2), (5,2)] and [(1,4), (4,4)].
What is the pythonic way to reverse each run (not the tuples within), e.g.
reverse_runs(my_list)
# [(5,2), (3,2), (2,3), (4,4), (1,4)]
Is this possible to do within a generator?
UPDATE
It has come to my attention that perhaps the example list was not clear. So instead consider:
my_list = [(1,"A"), (2,"B"), (5,"C"), (4,"C"), (3,"C"), (6,"A"),(7,"A"), (8,"D")]
Where the ideal output from reverse_runs would be
[(7,"A"), (6,"A"), (1,"A"), (2,"B"), (3,"C"), (4,"C"), (5,"C"), (8,"D")]
To be clear on terminology, I am adopting the use of "run" as used in describing TimSort which is what Python's sort function is based upon - giving it (the sort function) its safety.
Thus if you sort on a collection, should the collection be multi-faceted, then only the specified dimension is sorted on and if two elements are the same for the specified dimension, their ordering will not be altered.
Thus the following function:
sorted(my_list,key=lambda t: t[1])
yields:
[(1, 'A'), (6, 'A'), (7, 'A'), (2, 'B'), (5, 'C'), (4, 'C'), (3, 'C'), (8, 'D')]
and the run on "C" (i.e. (5, 'C'), (4, 'C'), (3, 'C') ) is not disturbed.
So in conclusion the desired output from the yet to be defined function reverse_runs:
1.) sorts the tuples by their last element
2.) maintaining the order of the first element, reverses runs on the last element
Ideally I would like this in a generator functions, but that does not (to me at the moment) seem possible.
Thus one could adopt the following strategy:
1.) Sort the tuples by the last element via sorted(my_list, key=lambda tuple: tuple[1])
2.) Identify the indexes for the last element in each tuple when the succeeding tuple (i+1) is different than the last element in (i). i.e. identify runs
3.) Make an empty list
4.) Using the splice operator, obtain, reverse, and the append each sublist to the empty list
I think this will work.
my_list = [(1,4), (2,3), (3,2), (4,4), (5,2)]
my_list = sorted(my_list, key=lambda tuple: (tuple[1], -tuple[0]))
print(my_list)
Output
[(5, 2), (3, 2), (2, 3), (4, 4), (1, 4)]
Misunderstood question. Less pretty but this should work for what you really want:
from itertools import groupby
from operator import itemgetter
def reverse_runs(l):
sorted_list = sorted(l, key=itemgetter(1))
reversed_groups = (reversed(list(g)) for _, g in groupby(sorted_list, key=itemgetter(1)))
reversed_runs = [e for sublist in reversed_groups for e in sublist]
return reversed_runs
if __name__ == '__main__':
print(reverse_runs([(1, 4), (2, 3), (3, 2), (4, 4), (5, 2)]))
print(reverse_runs([(1, "A"), (2, "B"), (5, "C"), (4, "C"), (3, "C"), (6, "A"), (7, "A"), (8, "D")]))
Output
[(5, 2), (3, 2), (2, 3), (4, 4), (1, 4)]
[(7, 'A'), (6, 'A'), (1, 'A'), (2, 'B'), (3, 'C'), (4, 'C'), (5, 'C'), (8, 'D')]
Generator version:
from itertools import groupby
from operator import itemgetter
def reverse_runs(l):
sorted_list = sorted(l, key=itemgetter(1))
reversed_groups = (reversed(list(g)) for _, g in groupby(sorted_list, key=itemgetter(1)))
for group in reversed_groups:
yield from group
if __name__ == '__main__':
print(list(reverse_runs([(1, 4), (2, 3), (3, 2), (4, 4), (5, 2)])))
print(list(reverse_runs([(1, "A"), (2, "B"), (5, "C"), (4, "C"), (3, "C"), (6, "A"), (7, "A"), (8, "D")])))
The most general case requires 2 sorts. The first sort is a reversed sort on the second criteria. The second sort is a forward sort on the first criteria:
pass1 = sorted(my_list, key=itemgetter(0), reverse=True)
result = sorted(pass1, key=itemgetter(1))
We can sort in multiple passes like this because python's sort algorithm is guaranteed to be stable.
However, in real life it's often possible to simply construct a more clever key function which allows the sorting to happen in one pass. This usually involves "negating" one of the values and relying on the fact that tuples order themselves lexicographically:
result = sorted(my_list, key=lambda t: (t[1], -t[0]))
In response to your update, it looks like the following might be a suitable solution:
from operator import itemgetter
from itertools import chain, groupby
my_list = [(1,"A"), (2,"B"), (5,"C"), (4,"C"), (3,"C"), (6,"A"),(7,"A"), (8,"D")]
pass1 = sorted(my_list, key=itemgetter(1))
result = list(chain.from_iterable(reversed(list(g)) for k, g in groupby(pass1, key=itemgetter(1))))
print(result)
We can take apart the expression:
chain.from_iterable(reversed(list(g)) for k, g in groupby(pass1, key=itemgetter(1)))
to try to figure out what it's doing...
First, let's look at groupby(pass1, key=itemgetter(1)). groupby will yield 2-tuples. The first item (k) in the tuple is the "key" -- e.g. whatever was returned from itemgetter(1). The key isn't really important here after the grouping has taken place, so we don't use it. The second item (g -- for "group") is an iterable that yields consecutive values that have the same "key". This is exactly the items that you requested, however, they're in the order that they were in after sorting. You requested them in reverse order. In order to reverse an arbitrary iterable, we can construct a list from it and then reverse the list. e.g. reversed(list(g)). Finally, we need to paste those chunks back together again which is where chain.from_iterable comes in.
If we want to get more clever, we might do better from an algorithmic standpoint (assuming that the "key" for the bins is hashible). The trick is to bin the objects in a dictionary and then sort the bins. This means that we're potentially sorting a much shorter list than the original:
from collections import defaultdict, deque
from itertools import chain
my_list = [(1,"A"), (2,"B"), (5,"C"), (4,"C"), (3,"C"), (6,"A"),(7,"A"), (8,"D")]
bins = defaultdict(deque)
for t in my_list:
bins[t[1]].appendleft(t)
print(list(chain.from_iterable(bins[key] for key in sorted(bins))))
Note that whether this does better than the first approach is very dependent on the initial data. Since TimSort is such a beautiful algorithm, if the data starts already grouped into bins, then this algorithm will likely not beat it (though, I'll leave it as an exercise for you to try...). However, if the data is well scattered (causing TimSort to behave more like MergeSort), then binning first will possibly make for a slight win.

zip list with a single element

I have a list of some elements, e.g. [1, 2, 3, 4] and a single object, e.g. 'a'. I want to produce a list of tuples with the elements of the list in the first position and the single object in the second position: [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')].
I could do it with zip like this:
def zip_with_scalar(l, o): # l - the list; o - the object
return list(zip(l, [o] * len(l)))
However, this gives me a feeling of creating and unnecessary list of repeating element.
Another possibility is
def zip_with_scalar(l, o):
return [(i, o) for i in l]
which is very clean and pythonic indeed, but here I do the whole thing "manually". In Haskell I would do something like
zipWithScalar l o = zip l $ repeat o
Is there any built-in function or trick, either for the zipping with scalar or for something that would enable me to use ordinary zip, i.e. sort-of infinite list?
This is the cloest to your Haskell solution:
import itertools
def zip_with_scalar(l, o):
return zip(l, itertools.repeat(o))
You could also use generators, which avoid creating a list like comprehensions do:
def zip_with_scalar(l, o):
return ((i, o) for i in l)
You can use the built-in map function:
>>> elements = [1, 2, 3, 4]
>>> key = 'a'
>>> map(lambda e: (e, key), elements)
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
This is a perfect job for the itertools.cycle class.
from itertools import cycle
def zip_with_scalar(l, o):
return zip(i, cycle(o))
Demo:
>>> from itertools import cycle
>>> l = [1, 2, 3, 4]
>>> list(zip(l, cycle('a')))
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
lst = [1,2,3,4]
tups = [(itm, 'a') for itm in lst]
tups
> [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
>>> l = [1, 2, 3, 4]
>>> list(zip(l, "a"*len(l)))
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
You could also use zip_longest with a fillvalue of o:
from itertools import zip_longest
def zip_with_scalar(l, o): # l - the list; o - the object
return zip_longest(l, [o], fillvalue=o)
print(list(zip_with_scalar([1, 2, 3, 4] ,"a")))
Just be aware that any mutable values used for o won't be copied whether using zip_longest or repeat.
The more-itertools library recently added a zip_broadcast() function that solves this problem well:
>>> from more_itertools import zip_broadcast
>>> list(zip_broadcast([1,2,3,4], 'a'))
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
This is a much more general solution than the other answers posted here:
Empty iterables are correctly handled.
There can be multiple iterable and/or scalar arguments.
The order of the scalar/iterable arguments doesn't need to be known.
If there are multiple iterable arguments, you can check that they are the same length with strict=True.
You can easily control whether or not strings should be treated as iterables (by default they are not).
Just define a class with infinite iterator which is initialized with the single element you want to injected in the lists:
class zipIterator:
def __init__(self, val):
self.__val = val
def __iter__(self):
return self
def __next__(self):
return self.__val
and then create your new list from this class and the lists you have:
elements = [1, 2, 3, 4]
key = 'a'
res = [it for it in zip(elements, zipIterator(key))]
the result would be:
>>res
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]

Python accessing slices of a list in order

In Python 2.7, I have a list named data, with some tuples, indexed by the first attribute, e.g.
data = [('A', 1, 2, 3), ('A', 10, 20, 30), ('A', 100, 200, 300),
('B', 1, 2, 3), ('B', 10, 20, 30),
('C', 15, 25, 30), ('C', 1, 20, 22), ('C', 100, 3, 8)]
There is a function f() that will work on any slice of data with the first index matching, e.g.
f( [x[1:] for x in data[:3] )
I want to call f (in proper sequence) on each slice of the array (group of tuples with the same first index) and compile the list of resulting values in a list.
I'm just starting with Python. Here is my solution, is there a better (faster or more elegant) way to do this?
slices = [x for x in xrange(len(data)) if data[x][0] != data[x-1][0]]
result = [f(data[start:end] for start, end in zip( [slices[:-1], slices[1:] )]
Thank you.
If you want to group on the first item of each tuple, you can do so with itertools.groupby():
from itertools import groupby
from operator import itemgetter
[f(list(g)) for k, g in groupby(data, key=itemgetter(0))]
The itemgetter(0) returns the first element of each tuple, which groupby() then gives you iterables for each group based on that value. Looping over each individual g result will then give you a sequence of tuples with just 'A', then 'B', etc.

Transpose/Unzip Function (inverse of zip)?

I have a list of 2-item tuples and I'd like to convert them to 2 lists where the first contains the first item in each tuple and the second list holds the second item.
For example:
original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# and I want to become...
result = (['a', 'b', 'c', 'd'], [1, 2, 3, 4])
Is there a builtin function that does that?
In 2.x, zip is its own inverse! Provided you use the special * operator.
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
This is equivalent to calling zip with each element of the list as a separate argument:
zip(('a', 1), ('b', 2), ('c', 3), ('d', 4))
except the arguments are passed to zip directly (after being converted to a tuple), so there's no need to worry about the number of arguments getting too big.
In 3.x, zip returns a lazy iterator, but this is trivially converted:
>>> list(zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)]))
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
You could also do
result = ([ a for a,b in original ], [ b for a,b in original ])
It should scale better. Especially if Python makes good on not expanding the list comprehensions unless needed.
(Incidentally, it makes a 2-tuple (pair) of lists, rather than a list of tuples, like zip does.)
If generators instead of actual lists are ok, this would do that:
result = (( a for a,b in original ), ( b for a,b in original ))
The generators don't munch through the list until you ask for each element, but on the other hand, they do keep references to the original list.
I like to use zip(*iterable) (which is the piece of code you're looking for) in my programs as so:
def unzip(iterable):
return zip(*iterable)
I find unzip more readable.
If you have lists that are not the same length, you may not want to use zip as per Patricks answer. This works:
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
But with different length lists, zip truncates each item to the length of the shortest list:
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e')]
You can use map with no function to fill empty results with None:
>>> map(None, *[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e'), (1, 2, 3, 4, None)]
zip() is marginally faster though.
To get a tuple of lists, as in the question:
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple([list(tup) for tup in zip(*original)])
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])
To unpack the two lists into separate variables:
list1, list2 = [list(tup) for tup in zip(*original)]
Naive approach
def transpose_finite_iterable(iterable):
return zip(*iterable) # `itertools.izip` for Python 2 users
works fine for finite iterable (e.g. sequences like list/tuple/str) of (potentially infinite) iterables which can be illustrated like
| |a_00| |a_10| ... |a_n0| |
| |a_01| |a_11| ... |a_n1| |
| |... | |... | ... |... | |
| |a_0i| |a_1i| ... |a_ni| |
| |... | |... | ... |... | |
where
n in ℕ,
a_ij corresponds to j-th element of i-th iterable,
and after applying transpose_finite_iterable we get
| |a_00| |a_01| ... |a_0i| ... |
| |a_10| |a_11| ... |a_1i| ... |
| |... | |... | ... |... | ... |
| |a_n0| |a_n1| ... |a_ni| ... |
Python example of such case where a_ij == j, n == 2
>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterable(iterable)
>>> next(result)
(0, 0)
>>> next(result)
(1, 1)
But we can't use transpose_finite_iterable again to return to structure of original iterable because result is an infinite iterable of finite iterables (tuples in our case):
>>> transpose_finite_iterable(result)
... hangs ...
Traceback (most recent call last):
File "...", line 1, in ...
File "...", line 2, in transpose_finite_iterable
MemoryError
So how can we deal with this case?
... and here comes the deque
After we take a look at docs of itertools.tee function, there is Python recipe that with some modification can help in our case
def transpose_finite_iterables(iterable):
iterator = iter(iterable)
try:
first_elements = next(iterator)
except StopIteration:
return ()
queues = [deque([element])
for element in first_elements]
def coordinate(queue):
while True:
if not queue:
try:
elements = next(iterator)
except StopIteration:
return
for sub_queue, element in zip(queues, elements):
sub_queue.append(element)
yield queue.popleft()
return tuple(map(coordinate, queues))
let's check
>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterables(transpose_finite_iterable(iterable))
>>> result
(<generator object transpose_finite_iterables.<locals>.coordinate at ...>, <generator object transpose_finite_iterables.<locals>.coordinate at ...>)
>>> next(result[0])
0
>>> next(result[0])
1
Synthesis
Now we can define general function for working with iterables of iterables ones of which are finite and another ones are potentially infinite using functools.singledispatch decorator like
from collections import (abc,
deque)
from functools import singledispatch
#singledispatch
def transpose(object_):
"""
Transposes given object.
"""
raise TypeError('Unsupported object type: {type}.'
.format(type=type))
#transpose.register(abc.Iterable)
def transpose_finite_iterables(object_):
"""
Transposes given iterable of finite iterables.
"""
iterator = iter(object_)
try:
first_elements = next(iterator)
except StopIteration:
return ()
queues = [deque([element])
for element in first_elements]
def coordinate(queue):
while True:
if not queue:
try:
elements = next(iterator)
except StopIteration:
return
for sub_queue, element in zip(queues, elements):
sub_queue.append(element)
yield queue.popleft()
return tuple(map(coordinate, queues))
def transpose_finite_iterable(object_):
"""
Transposes given finite iterable of iterables.
"""
yield from zip(*object_)
try:
transpose.register(abc.Collection, transpose_finite_iterable)
except AttributeError:
# Python3.5-
transpose.register(abc.Mapping, transpose_finite_iterable)
transpose.register(abc.Sequence, transpose_finite_iterable)
transpose.register(abc.Set, transpose_finite_iterable)
which can be considered as its own inverse (mathematicians call this kind of functions "involutions") in class of binary operators over finite non-empty iterables.
As a bonus of singledispatching we can handle numpy arrays like
import numpy as np
...
transpose.register(np.ndarray, np.transpose)
and then use it like
>>> array = np.arange(4).reshape((2,2))
>>> array
array([[0, 1],
[2, 3]])
>>> transpose(array)
array([[0, 2],
[1, 3]])
Note
Since transpose returns iterators and if someone wants to have a tuple of lists like in OP -- this can be made additionally with map built-in function like
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple(map(list, transpose(original)))
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])
Advertisement
I've added generalized solution to lz package from 0.5.0 version which can be used like
>>> from lz.transposition import transpose
>>> list(map(tuple, transpose(zip(range(10), range(10, 20)))))
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)]
P.S.
There is no solution (at least obvious) for handling potentially infinite iterable of potentially infinite iterables, but this case is less common though.
It's only another way to do it but it helped me a lot so I write it here:
Having this data structure:
X=[1,2,3,4]
Y=['a','b','c','d']
XY=zip(X,Y)
Resulting in:
In: XY
Out: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
The more pythonic way to unzip it and go back to the original is this one in my opinion:
x,y=zip(*XY)
But this return a tuple so if you need a list you can use:
x,y=(list(x),list(y))
Consider using more_itertools.unzip:
>>> from more_itertools import unzip
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> [list(x) for x in unzip(original)]
[['a', 'b', 'c', 'd'], [1, 2, 3, 4]]
None of the previous answers efficiently provide the required output, which is a tuple of lists, rather than a list of tuples. For the former, you can use tuple with map. Here's the difference:
res1 = list(zip(*original)) # [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
res2 = tuple(map(list, zip(*original))) # (['a', 'b', 'c', 'd'], [1, 2, 3, 4])
In addition, most of the previous solutions assume Python 2.7, where zip returns a list rather than an iterator.
For Python 3.x, you will need to pass the result to a function such as list or tuple to exhaust the iterator. For memory-efficient iterators, you can omit the outer list and tuple calls for the respective solutions.
While numpy arrays and pandas may be preferrable, this function imitates the behavior of zip(*args) when called as unzip(args).
Allows for generators, like the result from zip in Python 3, to be passed as args as it iterates through values.
def unzip(items, cls=list, ocls=tuple):
"""Zip function in reverse.
:param items: Zipped-like iterable.
:type items: iterable
:param cls: Container factory. Callable that returns iterable containers,
with a callable append attribute, to store the unzipped items. Defaults
to ``list``.
:type cls: callable, optional
:param ocls: Outer container factory. Callable that returns iterable
containers. with a callable append attribute, to store the inner
containers (see ``cls``). Defaults to ``tuple``.
:type ocls: callable, optional
:returns: Unzipped items in instances returned from ``cls``, in an instance
returned from ``ocls``.
"""
# iter() will return the same iterator passed to it whenever possible.
items = iter(items)
try:
i = next(items)
except StopIteration:
return ocls()
unzipped = ocls(cls([v]) for v in i)
for i in items:
for c, v in zip(unzipped, i):
c.append(v)
return unzipped
To use list cointainers, simply run unzip(zipped), as
unzip(zip(["a","b","c"],[1,2,3])) == (["a","b","c"],[1,2,3])
To use deques, or other any container sporting append, pass a factory function.
from collections import deque
unzip([("a",1),("b",2)], deque, list) == [deque(["a","b"]),deque([1,2])]
(Decorate cls and/or main_cls to micro manage container initialization, as briefly shown in the final assert statement above.)
Since it returns tuples (and can use tons of memory), the zip(*zipped) trick seems more clever than useful, to me.
Here's a function that will actually give you the inverse of zip.
def unzip(zipped):
"""Inverse of built-in zip function.
Args:
zipped: a list of tuples
Returns:
a tuple of lists
Example:
a = [1, 2, 3]
b = [4, 5, 6]
zipped = list(zip(a, b))
assert zipped == [(1, 4), (2, 5), (3, 6)]
unzipped = unzip(zipped)
assert unzipped == ([1, 2, 3], [4, 5, 6])
"""
unzipped = ()
if len(zipped) == 0:
return unzipped
dim = len(zipped[0])
for i in range(dim):
unzipped = unzipped + ([tup[i] for tup in zipped], )
return unzipped
While zip(*seq) is very useful, it may be unsuitable for very long sequences as it will create a tuple of values to be passed in. For example, I've been working with a coordinate system with over a million entries and find it signifcantly faster to create the sequences directly.
A generic approach would be something like this:
from collections import deque
seq = ((a1, b1, …), (a2, b2, …), …)
width = len(seq[0])
output = [deque(len(seq))] * width # preallocate memory
for element in seq:
for s, item in zip(output, element):
s.append(item)
But, depending on what you want to do with the result, the choice of collection can make a big difference. In my actual use case, using sets and no internal loop, is noticeably faster than all other approaches.
And, as others have noted, if you are doing this with datasets, it might make sense to use Numpy or Pandas collections instead.
Just to summarize:
# data
a = ('a', 'b', 'c', 'd')
b = (1, 2, 3, 4)
# forward
zipped = zip(a, b) # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# reverse
a_, b_ = zip(*zipped)
# verify
assert a == a_
assert b == b_
Here's a simple one-line answer that produces the desired output:
original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
list(zip(*original))
# [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]

Categories

Resources