recursive sequentially powerset function in python - python

what I expect for [1,2,3] ->
[1]
[1, 2]
[1, 2, 3]
[1, 3]
[2]
[2, 3]
[3]
but my function give this result, I can not fix it ->
def foo(L,first,last,Output):
if first>=last:
return
for i in range(first ,last):
print(Output+[L[i]])
foo(L,first+1,last,Output+[L[i]])
foo([1,2,3],0,3,[])
[1]
[1, 2]
[1, 2, 3]
[1, 3]
[1, 3, 3]
[2]
[2, 2]
[2, 2, 3]
[2, 3]
[2, 3, 3]
[3]
[3, 2]
[3, 2, 3]
[3, 3]
[3, 3, 3]
and in some situation I want to stop calculation and continue with others:
let say if 1 and 2 get come together , dont continue anymore
-> for [1,2,3] and (1,2)
what I expect :
[1]
[1, 3]
[2]
[2, 3]
[3]
iterative function is also good for me

inductive reasoning
Another way to think about the problem doesn't involve ranges, indexes or incrementing them, leading to many off-by-one errors. Instead we can reason about the problem inductively -
If the input t is empty, yield the empty set
(inductive) t has at least one element. For all p in the recursive sub-problem powerset(t[1:]), yield p and yield p with the first element, t[0] prepended.
def powerset(t):
if not t:
yield () # 1. empty t
else:
for p in powerset(t[1:]): # 2. at least one element
yield p
yield (t[0], *p)
By using yield we move the desired effect outside of the powerset function. This allows the caller to decide what happens with each produced set -
for p in powerset("abc"):
print(p) # <- desired effect
()
('a',)
('b',)
('a', 'b')
('c',)
('a', 'c')
('b', 'c')
('a', 'b', 'c')
for p in powerset("abc"):
print("".join(p)) # <- different effect
a
b
ab
c
ac
bc
abc
changing the order
I just want to process sequentially as I showed in the example.
The particular order you asked for can be achieved by reordering the yields. I also made an adjustment to remove the empty set from the output -
if the input t is empty, stop
(inductive) t has at least one element
yield the singleton set of the first element, t[0]
prepend the first element to each result of the sub-problem powerset(t[1:]) and yield
yield each result of the sub-problem powerset(t[1:])
def powerset(t):
if not t:
return # 1.
else:
yield (t[0],) # 2.
yield from map(lambda p: (t[0], *p), powerset(t[1:]))
yield from powerset(t[1:])
Notice above we compute powerset(t[1:]) twice. This is wasteful and can be avoided using itertools.tee -
from itertools import tee
def powerset(t):
if not t: return
yield (t[0],)
left, right = tee(powerset(t[1:])) # <- tee left & right
yield from map(lambda p: (t[0], *p), left) # <- left
yield from right # <- right
for p in powerset("abc"):
print(p)
('a',)
('a', 'b')
('a', 'b', 'c')
('a', 'c')
('b',)
('b', 'c')
('c',)
list of all subsets
Is it possible to do it without using yield? I need to keep it in a global list
Python uses iterables throughout its standard library. The prescribed way is to use yield however it's easy to convert to list using list -
result = list(powerset("abc"))
print(result)
[('a',), ('a', 'b'), ('a', 'b', 'c'), ('a', 'c'), ('b',), ('b', 'c'), ('c',)]
without using yield
If you have some compelling reason where powerset must return an array instead of an iterable, the transformation is elementary. Notice the structure of the program is identical -
def powerset(t):
if not t: return []
result = list(powerset(t[1:]))
return [
(t[0],),
*map(lambda p: (t[0], *p), result),
*result
]
print(powerset("abc"))
[('a',), ('a', 'b'), ('a', 'b', 'c'), ('a', 'c'), ('b',), ('b', 'c'), ('c',)]

Related

zip-like function that fails if a particular iterator is not consumed

I would like a zip like function that fails if the right-most iterator is not consumed. It should yield until the failure.
For example
>>> a = ['a', 'b', 'c']
>>> b = [1, 2, 3, 4]
>>> myzip(a, b)
Traceback (most recent call last):
...
ValueError: rightmost iterable was not consumed
>>> list(myzip(b, a))
[(1, 'a'), (2, 'b'), (3, 'c')]
Perhaps there a function in the standard library that can help with this?
Important Note:
In the real context the iterators are not over objects so I can't just check the length or index them.
Edit:
This is what I have come up with so far
def myzip(*iterables):
iters = [iter(i) for i in iterables]
zipped = zip(*iters)
try:
next(iters[-1])
raise ValueError('rightmost iterable was not consumed')
except StopIteration:
return zipped
Is this the best solution? It doesn't keep the state of the iterator because I call next on it, which might be a problem.
There's a few different ways you can go about doing this.
You could use the normal zip() with an iterator and manually check that it gets exhausted.
def check_consumed(it):
try:
next(it)
except StopIteration:
pass
else:
raise ValueError('rightmost iterable was not consumed')
b_it = iter(b)
list(zip(a, b_it))
check_consumed(b_it)
You could wrap the normal zip() to do the check for you.
def myzip(a, b):
b_it = iter(b)
yield from zip(a, b_it)
# Or, if you're on a Python version that doesn't have yield from:
#for item in zip(a, b_it):
# yield item
check_consumed(b_it)
list(myzip(a, b))
You could write your own zip() from scratch, using iter() and next().
(No code for this one, as option 2 is superior to this one in every way)
I think this one does the work by checking if the last consumer was completely consumed before returning
# Example copied from https://stackoverflow.com/questions/19151/build-a-basic-python-iterator
class Counter:
def __init__(self, low, high):
self.current = low
self.high = high
def __iter__(self):
return self
def __next__(self): # Python 3: def __next__(self)
if self.current > self.high:
raise StopIteration
else:
self.current += 1
return self.current - 1
# modified from https://docs.python.org/3.5/library/functions.html#zip
def myzip(*iterables):
sentinel = object()
iterators = [iter(it) for it in iterables]
while iterators:
result = []
for it in iterators:
elem = next(it, sentinel)
if elem is sentinel:
elem = next(iterators[-1], sentinel)
if elem is not sentinel:
raise ValueError("rightmost iterable was not consumed")
else:
return
result.append(elem)
yield tuple(result)
a = Counter(1,7)
b = range(9)
for val in myzip(a,b):
print(val)
There is already a zip_longest in itertools that allows for "expansion" of the shorter iterable by a default value.
Use that and check if your default value occurs: if so, it would have been a case of "rightmost element not consumed":
class MyError(ValueError):
"""Unique "default" value that is recognizeable and allows None to be in your values."""
pass
from itertools import zip_longest
isMyError = lambda x:isinstance(x,MyError)
def myzip(a,b):
"""Raises MyError if any non-consumed elements would occur using default zip()."""
K = zip_longest(a,b, fillvalue=MyError())
if all(not isMyError(t) for q in K for t in q):
return zip(a,b)
raise MyError("Not all items are consumed")
a = ['a', 'b', 'c', 'd']
b = [1, 2, 3, 4]
f = myzip(a, b)
print(list(f))
try:
a = ['a', 'b', ]
b = [1, 2, 3, 4]
f = myzip(a, b)
print(list(f))
except MyError as e:
print(e)
Output:
[('a', 1), ('b', 2), ('c', 3), ('d', 4)]
Not all items are consumed
This consumes (worst case) the full zipped list once to check and then returns it as iterable.
Other option using zip_longest from itertools. It returns also true or false if all lists are consumed. Maybe not the most efficient way, but could be improved:
from itertools import zip_longest
a = ['a', 'b', 'c', 'd']
b = [1, 2, 3, 4, 5]
c = ['aa', 'bb', 'cc', 'dd', 'ee', 'ff']
def myzip(*iterables):
consumed = True
zips = []
for zipped in zip_longest(*iterables):
if None in zipped:
consumed = False
else:
zips.append(zipped)
return [zips, consumed]
list(myzip(a, b, c))
#=> [[('a', 1, 'aa'), ('b', 2, 'bb'), ('c', 3, 'cc'), ('d', 4, 'dd')], False]

Maintaining the order of the elements in a frozen set

I have a list of tuples, each tuple of which contains one string and two integers. The list looks like this:
x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
The list contains thousands of such tuples. Now if I want to get unique combinations, I can do the frozenset on my list as follows:
y = set(map(frozenset, x))
This gives me the following result:
{frozenset({'a', 2, 1}), frozenset({'x', 5, 6}), frozenset({3, 'b', 4})}
I know that set is an unordered data structure and this is normal case but I want to preserve the order of the elements here so that I can thereafter insert the elements in a pandas dataframe. The dataframe will look like this:
Name Marks1 Marks2
0 a 1 2
1 b 3 4
2 x 5 6
Instead of operating on the set of frozensets directly you could use that only as a helper data-structure - like in the unique_everseen recipe in the itertools section (copied verbatim):
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Basically this would solve the issue when you use key=frozenset:
>>> x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
>>> list(unique_everseen(x, key=frozenset))
[('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
This returns the elements as-is and it also maintains the relative order between the elements.
No ordering with frozensets. You can instead create sorted tuples to check for the existence of an item, adding the original if the tuple does not exist in the set:
y = set()
lst = []
for i in x:
t = tuple(sorted(i, key=str)
if t not in y:
y.add(t)
lst.append(i)
print(lst)
# [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
The first entry gets preserved.
There are some quite useful functions in NumPy which can help you to solve this problem.
import numpy as np
chrs, indices = np.unique(list(map(lambda x:x[0], x)), return_index=True)
chrs, indices
>> (array(['a', 'b', 'x'],
dtype='<U1'), array([0, 1, 2]))
[x[indices[i]] for i in range(indices.size)]
>> [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
You can do it by simple using the zip to maintain the order in the frozenset.
Give this a try pls.
l = ['col1','col2','col3','col4']
>>> frozenset(l)
--> frozenset({'col2', 'col4', 'col3', 'col1'})
>>> frozenset(zip(*zip(l)))
--> frozenset({('col1', 'col2', 'col3', 'col4')})
Taking an example from the question asked:
>>> x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
>>> frozenset(zip(*zip(x)))
--> frozenset({(('a', 1, 2), ('b', 3, 4), ('x', 5, 6), ('a', 2, 1))})

Flattening a list that contains lists of tuples, letters and integers

I need to flatten a list recursively:
Before the list looks like:
L=[1,[2,[‘a’,(3,’b’)]],(5,6),([11,22])]
After:
Lflat=[1,2,’a’,(3,’b’),(5,6),([11,22])]
I've come across an issue with my code (lst1 is an empty lst1)
def list_flatten(lst,lst1):
for item in lst:
if type(item) == tuple:
print(item)
lst1.append(item)
elif type(item) == list:
list_flatten(item,lst1)
else:
lst1.append(item)
return lst1
This returns the following:
OUTPUT : [1, 2, 'a', (3, 'b'), (5, 6), 11, 22]
Which lead me to find out that ([]) is considered a list, not a tuple.
Now my questions are as following:
Say I were to define lst1=[] inside the main program. How do I make
it so the recursion doesn't empty the list out every iteration?
Why is ([]) considered a list?
Your list_flatten function mutates the lst1 argument, so you don't really need to return anything. You can call it like this:
L = [1,[2,['a',(3,'b')]],(5,6),([11,22])]
def list_flatten(lst, lst1):
for item in lst:
if isinstance(item, list):
list_flatten(item, lst1)
else:
lst1.append(item)
Lflat = []
list_flatten(L, Lflat)
print(Lflat)
output
[1, 2, 'a', (3, 'b'), (5, 6), 11, 22]
It's recommended to use isinstance rather than type because that makes the code more versatile: it will also work with objects derived from list.
We can re-write the function so that you don't need to pass in lst1:
def list_flatten(lst, lst1=None):
if lst1 is None:
lst1 = []
for item in lst:
if isinstance(item, list):
list_flatten(item, lst1)
else:
lst1.append(item)
return lst1
Lflat = list_flatten(L)
print(Lflat)
We give lst1 a default value of None and on the top level of the recursion we re-bind the name lst1 to an empty list to collect the results.
We can't give lst1 a default value of []. That's because default args are created when the function is compiled, not when the function is called, and if we gave lst1 a default value of [] that same list would get used on every call. It would look like it does what we want the first time we used list_flatten, but it would not behave as desired on subsequent calls. Here's a short demo.
L = [1,[2,['a',(3,'b')]],(5,6),([11,22])]
def list_flatten(lst, lst1=[]):
for item in lst:
if isinstance(item, list):
list_flatten(item, lst1)
else:
lst1.append(item)
return lst1
Lflat = list_flatten(L)
print(Lflat)
Lflat = list_flatten(L)
print(Lflat)
output
[1, 2, 'a', (3, 'b'), (5, 6), 11, 22]
[1, 2, 'a', (3, 'b'), (5, 6), 11, 22, 1, 2, 'a', (3, 'b'), (5, 6), 11, 22]
As you can see, lst1 has retained its contents from the first call. For more info on this important topic, please see “Least Astonishment” and the Mutable Default Argument. There are times when this behviour is desirable, but in such cases it's wise to add a comment to your code that you're intentionally using a mutable default argument.
Yet another way is to make list_flatten into a generator, and collect its output into a list:
def list_flatten(lst):
for item in lst:
if isinstance(item, list):
yield from list_flatten(item)
else:
yield item
Lflat = list(list_flatten(L))
print(Lflat)
In recent versions of Python you can replace list(list_flatten(L)) with [*list_flatten(L)].
Python 2 doesn't have yield from, but you can replace that line with:
for u in list_flatten(item):
yield u
If you don't actually need the list you can call the generator like this:
for u in list_flatten(L):
print(u)
output
1
2
a
(3, 'b')
(5, 6)
11
22
You can note that what you need is:
if the list is empty, return it unchanged
if it has one single element and that element is not a list, return the list unchanged
else flatten first element, flatten the end of the list and concatenate the two sublists
In Python code, it leads to:
def flatten(L):
if len(L) == 0: return L
elif len(L) == 1 and not isinstance(L[0], list): return L
else:
return (flatten(L[0] if isinstance(L[0], list)
else [L[0]]) + flatten(L[1:]))
it gives as expected:
>>> L = [1, 2, 'a', (3, 'b'), (5, 6), ([11, 22],)]
>>> flatten(L)
[1, 2, 'a', (3, 'b'), (5, 6), ([11, 22],)]

zip list with a single element

I have a list of some elements, e.g. [1, 2, 3, 4] and a single object, e.g. 'a'. I want to produce a list of tuples with the elements of the list in the first position and the single object in the second position: [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')].
I could do it with zip like this:
def zip_with_scalar(l, o): # l - the list; o - the object
return list(zip(l, [o] * len(l)))
However, this gives me a feeling of creating and unnecessary list of repeating element.
Another possibility is
def zip_with_scalar(l, o):
return [(i, o) for i in l]
which is very clean and pythonic indeed, but here I do the whole thing "manually". In Haskell I would do something like
zipWithScalar l o = zip l $ repeat o
Is there any built-in function or trick, either for the zipping with scalar or for something that would enable me to use ordinary zip, i.e. sort-of infinite list?
This is the cloest to your Haskell solution:
import itertools
def zip_with_scalar(l, o):
return zip(l, itertools.repeat(o))
You could also use generators, which avoid creating a list like comprehensions do:
def zip_with_scalar(l, o):
return ((i, o) for i in l)
You can use the built-in map function:
>>> elements = [1, 2, 3, 4]
>>> key = 'a'
>>> map(lambda e: (e, key), elements)
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
This is a perfect job for the itertools.cycle class.
from itertools import cycle
def zip_with_scalar(l, o):
return zip(i, cycle(o))
Demo:
>>> from itertools import cycle
>>> l = [1, 2, 3, 4]
>>> list(zip(l, cycle('a')))
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
lst = [1,2,3,4]
tups = [(itm, 'a') for itm in lst]
tups
> [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
>>> l = [1, 2, 3, 4]
>>> list(zip(l, "a"*len(l)))
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
You could also use zip_longest with a fillvalue of o:
from itertools import zip_longest
def zip_with_scalar(l, o): # l - the list; o - the object
return zip_longest(l, [o], fillvalue=o)
print(list(zip_with_scalar([1, 2, 3, 4] ,"a")))
Just be aware that any mutable values used for o won't be copied whether using zip_longest or repeat.
The more-itertools library recently added a zip_broadcast() function that solves this problem well:
>>> from more_itertools import zip_broadcast
>>> list(zip_broadcast([1,2,3,4], 'a'))
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
This is a much more general solution than the other answers posted here:
Empty iterables are correctly handled.
There can be multiple iterable and/or scalar arguments.
The order of the scalar/iterable arguments doesn't need to be known.
If there are multiple iterable arguments, you can check that they are the same length with strict=True.
You can easily control whether or not strings should be treated as iterables (by default they are not).
Just define a class with infinite iterator which is initialized with the single element you want to injected in the lists:
class zipIterator:
def __init__(self, val):
self.__val = val
def __iter__(self):
return self
def __next__(self):
return self.__val
and then create your new list from this class and the lists you have:
elements = [1, 2, 3, 4]
key = 'a'
res = [it for it in zip(elements, zipIterator(key))]
the result would be:
>>res
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]

Transpose/Unzip Function (inverse of zip)?

I have a list of 2-item tuples and I'd like to convert them to 2 lists where the first contains the first item in each tuple and the second list holds the second item.
For example:
original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# and I want to become...
result = (['a', 'b', 'c', 'd'], [1, 2, 3, 4])
Is there a builtin function that does that?
In 2.x, zip is its own inverse! Provided you use the special * operator.
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
This is equivalent to calling zip with each element of the list as a separate argument:
zip(('a', 1), ('b', 2), ('c', 3), ('d', 4))
except the arguments are passed to zip directly (after being converted to a tuple), so there's no need to worry about the number of arguments getting too big.
In 3.x, zip returns a lazy iterator, but this is trivially converted:
>>> list(zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)]))
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
You could also do
result = ([ a for a,b in original ], [ b for a,b in original ])
It should scale better. Especially if Python makes good on not expanding the list comprehensions unless needed.
(Incidentally, it makes a 2-tuple (pair) of lists, rather than a list of tuples, like zip does.)
If generators instead of actual lists are ok, this would do that:
result = (( a for a,b in original ), ( b for a,b in original ))
The generators don't munch through the list until you ask for each element, but on the other hand, they do keep references to the original list.
I like to use zip(*iterable) (which is the piece of code you're looking for) in my programs as so:
def unzip(iterable):
return zip(*iterable)
I find unzip more readable.
If you have lists that are not the same length, you may not want to use zip as per Patricks answer. This works:
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
But with different length lists, zip truncates each item to the length of the shortest list:
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e')]
You can use map with no function to fill empty results with None:
>>> map(None, *[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e'), (1, 2, 3, 4, None)]
zip() is marginally faster though.
To get a tuple of lists, as in the question:
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple([list(tup) for tup in zip(*original)])
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])
To unpack the two lists into separate variables:
list1, list2 = [list(tup) for tup in zip(*original)]
Naive approach
def transpose_finite_iterable(iterable):
return zip(*iterable) # `itertools.izip` for Python 2 users
works fine for finite iterable (e.g. sequences like list/tuple/str) of (potentially infinite) iterables which can be illustrated like
| |a_00| |a_10| ... |a_n0| |
| |a_01| |a_11| ... |a_n1| |
| |... | |... | ... |... | |
| |a_0i| |a_1i| ... |a_ni| |
| |... | |... | ... |... | |
where
n in ℕ,
a_ij corresponds to j-th element of i-th iterable,
and after applying transpose_finite_iterable we get
| |a_00| |a_01| ... |a_0i| ... |
| |a_10| |a_11| ... |a_1i| ... |
| |... | |... | ... |... | ... |
| |a_n0| |a_n1| ... |a_ni| ... |
Python example of such case where a_ij == j, n == 2
>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterable(iterable)
>>> next(result)
(0, 0)
>>> next(result)
(1, 1)
But we can't use transpose_finite_iterable again to return to structure of original iterable because result is an infinite iterable of finite iterables (tuples in our case):
>>> transpose_finite_iterable(result)
... hangs ...
Traceback (most recent call last):
File "...", line 1, in ...
File "...", line 2, in transpose_finite_iterable
MemoryError
So how can we deal with this case?
... and here comes the deque
After we take a look at docs of itertools.tee function, there is Python recipe that with some modification can help in our case
def transpose_finite_iterables(iterable):
iterator = iter(iterable)
try:
first_elements = next(iterator)
except StopIteration:
return ()
queues = [deque([element])
for element in first_elements]
def coordinate(queue):
while True:
if not queue:
try:
elements = next(iterator)
except StopIteration:
return
for sub_queue, element in zip(queues, elements):
sub_queue.append(element)
yield queue.popleft()
return tuple(map(coordinate, queues))
let's check
>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterables(transpose_finite_iterable(iterable))
>>> result
(<generator object transpose_finite_iterables.<locals>.coordinate at ...>, <generator object transpose_finite_iterables.<locals>.coordinate at ...>)
>>> next(result[0])
0
>>> next(result[0])
1
Synthesis
Now we can define general function for working with iterables of iterables ones of which are finite and another ones are potentially infinite using functools.singledispatch decorator like
from collections import (abc,
deque)
from functools import singledispatch
#singledispatch
def transpose(object_):
"""
Transposes given object.
"""
raise TypeError('Unsupported object type: {type}.'
.format(type=type))
#transpose.register(abc.Iterable)
def transpose_finite_iterables(object_):
"""
Transposes given iterable of finite iterables.
"""
iterator = iter(object_)
try:
first_elements = next(iterator)
except StopIteration:
return ()
queues = [deque([element])
for element in first_elements]
def coordinate(queue):
while True:
if not queue:
try:
elements = next(iterator)
except StopIteration:
return
for sub_queue, element in zip(queues, elements):
sub_queue.append(element)
yield queue.popleft()
return tuple(map(coordinate, queues))
def transpose_finite_iterable(object_):
"""
Transposes given finite iterable of iterables.
"""
yield from zip(*object_)
try:
transpose.register(abc.Collection, transpose_finite_iterable)
except AttributeError:
# Python3.5-
transpose.register(abc.Mapping, transpose_finite_iterable)
transpose.register(abc.Sequence, transpose_finite_iterable)
transpose.register(abc.Set, transpose_finite_iterable)
which can be considered as its own inverse (mathematicians call this kind of functions "involutions") in class of binary operators over finite non-empty iterables.
As a bonus of singledispatching we can handle numpy arrays like
import numpy as np
...
transpose.register(np.ndarray, np.transpose)
and then use it like
>>> array = np.arange(4).reshape((2,2))
>>> array
array([[0, 1],
[2, 3]])
>>> transpose(array)
array([[0, 2],
[1, 3]])
Note
Since transpose returns iterators and if someone wants to have a tuple of lists like in OP -- this can be made additionally with map built-in function like
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple(map(list, transpose(original)))
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])
Advertisement
I've added generalized solution to lz package from 0.5.0 version which can be used like
>>> from lz.transposition import transpose
>>> list(map(tuple, transpose(zip(range(10), range(10, 20)))))
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)]
P.S.
There is no solution (at least obvious) for handling potentially infinite iterable of potentially infinite iterables, but this case is less common though.
It's only another way to do it but it helped me a lot so I write it here:
Having this data structure:
X=[1,2,3,4]
Y=['a','b','c','d']
XY=zip(X,Y)
Resulting in:
In: XY
Out: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
The more pythonic way to unzip it and go back to the original is this one in my opinion:
x,y=zip(*XY)
But this return a tuple so if you need a list you can use:
x,y=(list(x),list(y))
Consider using more_itertools.unzip:
>>> from more_itertools import unzip
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> [list(x) for x in unzip(original)]
[['a', 'b', 'c', 'd'], [1, 2, 3, 4]]
None of the previous answers efficiently provide the required output, which is a tuple of lists, rather than a list of tuples. For the former, you can use tuple with map. Here's the difference:
res1 = list(zip(*original)) # [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
res2 = tuple(map(list, zip(*original))) # (['a', 'b', 'c', 'd'], [1, 2, 3, 4])
In addition, most of the previous solutions assume Python 2.7, where zip returns a list rather than an iterator.
For Python 3.x, you will need to pass the result to a function such as list or tuple to exhaust the iterator. For memory-efficient iterators, you can omit the outer list and tuple calls for the respective solutions.
While numpy arrays and pandas may be preferrable, this function imitates the behavior of zip(*args) when called as unzip(args).
Allows for generators, like the result from zip in Python 3, to be passed as args as it iterates through values.
def unzip(items, cls=list, ocls=tuple):
"""Zip function in reverse.
:param items: Zipped-like iterable.
:type items: iterable
:param cls: Container factory. Callable that returns iterable containers,
with a callable append attribute, to store the unzipped items. Defaults
to ``list``.
:type cls: callable, optional
:param ocls: Outer container factory. Callable that returns iterable
containers. with a callable append attribute, to store the inner
containers (see ``cls``). Defaults to ``tuple``.
:type ocls: callable, optional
:returns: Unzipped items in instances returned from ``cls``, in an instance
returned from ``ocls``.
"""
# iter() will return the same iterator passed to it whenever possible.
items = iter(items)
try:
i = next(items)
except StopIteration:
return ocls()
unzipped = ocls(cls([v]) for v in i)
for i in items:
for c, v in zip(unzipped, i):
c.append(v)
return unzipped
To use list cointainers, simply run unzip(zipped), as
unzip(zip(["a","b","c"],[1,2,3])) == (["a","b","c"],[1,2,3])
To use deques, or other any container sporting append, pass a factory function.
from collections import deque
unzip([("a",1),("b",2)], deque, list) == [deque(["a","b"]),deque([1,2])]
(Decorate cls and/or main_cls to micro manage container initialization, as briefly shown in the final assert statement above.)
Since it returns tuples (and can use tons of memory), the zip(*zipped) trick seems more clever than useful, to me.
Here's a function that will actually give you the inverse of zip.
def unzip(zipped):
"""Inverse of built-in zip function.
Args:
zipped: a list of tuples
Returns:
a tuple of lists
Example:
a = [1, 2, 3]
b = [4, 5, 6]
zipped = list(zip(a, b))
assert zipped == [(1, 4), (2, 5), (3, 6)]
unzipped = unzip(zipped)
assert unzipped == ([1, 2, 3], [4, 5, 6])
"""
unzipped = ()
if len(zipped) == 0:
return unzipped
dim = len(zipped[0])
for i in range(dim):
unzipped = unzipped + ([tup[i] for tup in zipped], )
return unzipped
While zip(*seq) is very useful, it may be unsuitable for very long sequences as it will create a tuple of values to be passed in. For example, I've been working with a coordinate system with over a million entries and find it signifcantly faster to create the sequences directly.
A generic approach would be something like this:
from collections import deque
seq = ((a1, b1, …), (a2, b2, …), …)
width = len(seq[0])
output = [deque(len(seq))] * width # preallocate memory
for element in seq:
for s, item in zip(output, element):
s.append(item)
But, depending on what you want to do with the result, the choice of collection can make a big difference. In my actual use case, using sets and no internal loop, is noticeably faster than all other approaches.
And, as others have noted, if you are doing this with datasets, it might make sense to use Numpy or Pandas collections instead.
Just to summarize:
# data
a = ('a', 'b', 'c', 'd')
b = (1, 2, 3, 4)
# forward
zipped = zip(a, b) # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# reverse
a_, b_ = zip(*zipped)
# verify
assert a == a_
assert b == b_
Here's a simple one-line answer that produces the desired output:
original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
list(zip(*original))
# [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]

Categories

Resources