Maintaining the order of the elements in a frozen set - python

I have a list of tuples, each tuple of which contains one string and two integers. The list looks like this:
x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
The list contains thousands of such tuples. Now if I want to get unique combinations, I can do the frozenset on my list as follows:
y = set(map(frozenset, x))
This gives me the following result:
{frozenset({'a', 2, 1}), frozenset({'x', 5, 6}), frozenset({3, 'b', 4})}
I know that set is an unordered data structure and this is normal case but I want to preserve the order of the elements here so that I can thereafter insert the elements in a pandas dataframe. The dataframe will look like this:
Name Marks1 Marks2
0 a 1 2
1 b 3 4
2 x 5 6

Instead of operating on the set of frozensets directly you could use that only as a helper data-structure - like in the unique_everseen recipe in the itertools section (copied verbatim):
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Basically this would solve the issue when you use key=frozenset:
>>> x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
>>> list(unique_everseen(x, key=frozenset))
[('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
This returns the elements as-is and it also maintains the relative order between the elements.

No ordering with frozensets. You can instead create sorted tuples to check for the existence of an item, adding the original if the tuple does not exist in the set:
y = set()
lst = []
for i in x:
t = tuple(sorted(i, key=str)
if t not in y:
y.add(t)
lst.append(i)
print(lst)
# [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
The first entry gets preserved.

There are some quite useful functions in NumPy which can help you to solve this problem.
import numpy as np
chrs, indices = np.unique(list(map(lambda x:x[0], x)), return_index=True)
chrs, indices
>> (array(['a', 'b', 'x'],
dtype='<U1'), array([0, 1, 2]))
[x[indices[i]] for i in range(indices.size)]
>> [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]

You can do it by simple using the zip to maintain the order in the frozenset.
Give this a try pls.
l = ['col1','col2','col3','col4']
>>> frozenset(l)
--> frozenset({'col2', 'col4', 'col3', 'col1'})
>>> frozenset(zip(*zip(l)))
--> frozenset({('col1', 'col2', 'col3', 'col4')})
Taking an example from the question asked:
>>> x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
>>> frozenset(zip(*zip(x)))
--> frozenset({(('a', 1, 2), ('b', 3, 4), ('x', 5, 6), ('a', 2, 1))})

Related

Sort list of iterables by nth element which might not be present in all of them

Given a list of iterables:
li = [(1,2), (3,4,8), (3,4,7), (9,)]
I want to sort by the third element if present, otherwise leave the order unchanged. So here the desired output would be:
[(1,2), (3,4,7), (3,4,8), (9,)]
Using li.sort(key=lambda x:x[2]) returns an IndexError. I tried a custom function:
def safefetch(li, idx):
try:
return li[idx]
except IndexError:
return # (ie return None)
li.sort(key=lambda x: safefetch(x, 2))
But None in sorting yields a TypeError.
Broader context: I first want to sort by the first element, then the second, then the third, etc. until the length of the longest element, ie I want to run several sorts of decreasing privilege (as in SQL's ORDER BY COL1 , COL2), while preserving order among those elements that aren't relevant. So: first sort everything by first element; then among the ties on el_1 sort on el_2, etc.. until el_n. My feeling is that calling a sort function on the whole list is probably the wrong approach.
(Note that this was an "XY question": for my actual question, just using sorted on tuples is simplest, as Patrick Artner pointed out in the comments. But the question is posed is trickier.)
We can first get the indices for distinct lengths of elements in the list via a defaultdict and then sort each sublist with numpy's fancy indexing:
from collections import defaultdict
# {length -> inds} mapping
d = defaultdict(list)
# collect indices per length
for j, tup in enumerate(li):
d[len(tup)].append(j)
# sort
li = np.array(li, dtype=object)
for inds in d.values():
li[inds] = sorted(li[inds])
# convert back to list if desired
li = li.tolist()
to get li at the end as
[(1, 2), (3, 4, 7), (3, 4, 8), (9,)]
For some other samples:
In [134]: the_sorter([(12,), (3,4,8), (3,4,7), (9,)])
Out[134]: [(9,), (3, 4, 7), (3, 4, 8), (12,)]
In [135]: the_sorter([(12,), (3,4,8,9), (3,4,7), (11, 9), (9, 11), (2, 4, 4, 4)])
Out[135]: [(12,), (2, 4, 4, 4), (3, 4, 7), (9, 11), (11, 9), (3, 4, 8, 9)]
where the_sorter is above procedure wrapped in a function (name lacks imagination...)
def the_sorter(li):
# {length -> inds} mapping
d = defaultdict(list)
# collect indices per length
for j, tup in enumerate(li):
d[len(tup)].append(j)
# sort
li = np.array(li)
for inds in d.values():
li[inds] = sorted(li[inds])
return li.tolist()
Whatever you return as fallback value must be comparable to the other key values that might be returned. In your example that would require a numerical value.
import sys
def safefetch(li, idx):
try:
return li[idx]
except IndexError:
return sys.maxsize # largest int possible
This would put all the short ones at the back of the sort order, but maintain a stable order among them.
Inspired by #Mustafa Aydın here is a solution in Pandas. Would prefer one without the memory overhead of a dataframe, but this might be good enough.
import pandas as pd
li = [(1,2), (3,4,8), (3,4,7), (9,)]
tmp = pd.DataFrame(li)
[tuple(int(el) for el in t if not pd.isna(el)) for t in tmp.sort_values(by=tmp.columns.tolist()).values]
> [(1, 2), (3, 4, 7), (3, 4, 8), (9,)]

Access individual elements of tuples of dictionary keys

Considering the code snippet below -
list1 = [1,2,3,4]
list2 = [1,2,3,4]
list3 = ['a','b','c','d']
dct = dict(zip(zip(list1,list2),list3))
print(dct)
gives me,
{(1, 1): 'a', (2, 2): 'b', (3, 3): 'c', (4, 4): 'd'}
Now,
print(dct.keys())
gives me,
dict_keys([(1, 1), (2, 2), (3, 3), (4, 4)])
How can i access first element of the above list of keys?
Something like -
dct.keys[0, 0] = 1
dct.keys[0, 1] = 1
dct.keys[1, 0] = 2
dct.keys[1, 2] = 2
and so on...
Remember that a dict is unordered, and that dict.keys() may change order.
That said, to access the first element of a list, as you said, you can use list[element_index]. If the elemnt is an iterable, do that again!
So it would be
dct_keys = list(yourdict.keys())
dct_keys[0][0] = 1
dct_keys[0][1] = 1
dct_keys[1][0] = 2
dct_keys[1][1] = 2
You need to first convert the dct.keys() output to a list, and then the problem reduces to simple list-of-tuples indexing. To convert your .keys() output to a list, there are multiple available ways (check this out). Personally, I find using list comprehension as one of the simplest and most generic ways:
>>> [key for key in dct.keys()]
[(1, 1), (2, 2), (3, 3), (4, 4)]
And now simply index this list of tuples as:
>>> [key for key in dct.keys()][0][0]
1
Hope that helps.

How can I remove duplicate tuples from a list based on index value of tuple while maintaining the order of tuple? [duplicate]

This question already has answers here:
How do I remove duplicates from a list, while preserving order?
(30 answers)
Closed 4 years ago.
I want to remove those tuples which had same values at index 0 except the first occurance. I looked at other similar questions but did not get a particular answer I am looking for. Can somebody please help me?
Below is what I tried.
from itertools import groupby
import random
Newlist = []
abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]
Newlist = [random.choice(tuple(g)) for _, g in groupby(abc, key=lambda x: x[0])]
print Newlist
my expected output : [(1,2,3), (2,3,4), (0,2,0), (5,4,3)]
A simple way is to loop over the list and keep track of which elements you've already found:
abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]
found = set()
NewList = []
for a in abc:
if a[0] not in found:
NewList.append(a)
found.add(a[0])
print(NewList)
#[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
found is a set. At each iteration we check if the first element in the tuple is already in found. If not, we append the whole tuple to NewList. At the end of each iteration we add the first element of the tuple to found.
A better alternative using OrderedDict:
from collections import OrderedDict
abc = [(1,2,3), (2,3,4), (1,0,3), (0,2,0), (2,4,5),(5,4,3), (0,4,1)]
d = OrderedDict()
for t in abc:
d.setdefault(t[0], t)
abc_unique = list(d.values())
print(abc_unique)
Output:
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
Simple although not very efficient:
abc = [(1,2,3), (2,3,4), (1,0,3), (0,2,0), (2,4,5),(5,4,3), (0,4,1)]
abc_unique = [t for i, t in enumerate(abc) if not any(t[0] == p[0] for p in abc[:i])]
print(abc_unique)
Output:
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
The itertools recipes (Python 2: itertools recipes, but basically no difference in this case) contains a recipe for this, which is a bit more general than the implementation by #pault. It also uses a set:
Python 2:
from itertools import ifilterfalse as filterfalse
Python 3:
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Use it with:
abc = [(1,2,3), (2,3,4), (1,0,3),(0,2,0), (2,4,5),(5,4,3), (0,4,1)]
Newlist = list(unique_everseen(abc, key=lambda x: x[0]))
print Newlist
# [(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
This should be slightly faster because of the caching of the set.add method (only really relevant if your abc is large) and should also be more general because it makes the key function a parameter.
Apart from that, the same limitation I already mentioned in a comment applies: this only works if the first element of the tuple is actually hashable (which numbers, like in the given example, are, of course).
#PatrickHaugh claims:
but the question is explicitly about maintaining the order of the
tuples. I don't think there's a solution using groupby
I never miss an opportunity to (ab)use groupby(). Here's my solution sans sorting (once or twice):
from itertools import groupby, chain
abc = [(1, 2, 3), (2, 3, 4), (1, 0, 3), (0, 2, 0), (2, 4, 5), (5, 4, 3), (0, 4, 1)]
Newlist = list((lambda s: chain.from_iterable(g for f, g in groupby(abc, lambda k: s.get(k[0]) != s.setdefault(k[0], True)) if f))({}))
print(Newlist)
OUTPUT
% python3 test.py
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
%
To use groupby correctly, the sequence must be sorted:
>>> [next(g) for k,g in groupby(sorted(abc, key=lambda x:x[0]), key=lambda x:x[0])]
[(0, 2, 0), (1, 2, 3), (2, 3, 4), (5, 4, 3)]
or if you need that very exact order of your example (i.e. maintaining original order):
>>> [t[2:] for t in sorted([next(g) for k,g in groupby(sorted([(t[0], i)+t for i,t in enumerate(abc)]), lambda x:x[0])], key=lambda x:x[1])]
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]
the trick here is to add one field for keeping the original order to restore after the groupby() step.
Edit: even a bit shorter:
>>> [t[1:] for t in sorted([next(g)[1:] for k,g in groupby(sorted([(t[0], i)+t for i,t in enumerate(abc)]), lambda x:x[0])])]
[(1, 2, 3), (2, 3, 4), (0, 2, 0), (5, 4, 3)]

zip list with a single element

I have a list of some elements, e.g. [1, 2, 3, 4] and a single object, e.g. 'a'. I want to produce a list of tuples with the elements of the list in the first position and the single object in the second position: [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')].
I could do it with zip like this:
def zip_with_scalar(l, o): # l - the list; o - the object
return list(zip(l, [o] * len(l)))
However, this gives me a feeling of creating and unnecessary list of repeating element.
Another possibility is
def zip_with_scalar(l, o):
return [(i, o) for i in l]
which is very clean and pythonic indeed, but here I do the whole thing "manually". In Haskell I would do something like
zipWithScalar l o = zip l $ repeat o
Is there any built-in function or trick, either for the zipping with scalar or for something that would enable me to use ordinary zip, i.e. sort-of infinite list?
This is the cloest to your Haskell solution:
import itertools
def zip_with_scalar(l, o):
return zip(l, itertools.repeat(o))
You could also use generators, which avoid creating a list like comprehensions do:
def zip_with_scalar(l, o):
return ((i, o) for i in l)
You can use the built-in map function:
>>> elements = [1, 2, 3, 4]
>>> key = 'a'
>>> map(lambda e: (e, key), elements)
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
This is a perfect job for the itertools.cycle class.
from itertools import cycle
def zip_with_scalar(l, o):
return zip(i, cycle(o))
Demo:
>>> from itertools import cycle
>>> l = [1, 2, 3, 4]
>>> list(zip(l, cycle('a')))
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
lst = [1,2,3,4]
tups = [(itm, 'a') for itm in lst]
tups
> [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
>>> l = [1, 2, 3, 4]
>>> list(zip(l, "a"*len(l)))
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
You could also use zip_longest with a fillvalue of o:
from itertools import zip_longest
def zip_with_scalar(l, o): # l - the list; o - the object
return zip_longest(l, [o], fillvalue=o)
print(list(zip_with_scalar([1, 2, 3, 4] ,"a")))
Just be aware that any mutable values used for o won't be copied whether using zip_longest or repeat.
The more-itertools library recently added a zip_broadcast() function that solves this problem well:
>>> from more_itertools import zip_broadcast
>>> list(zip_broadcast([1,2,3,4], 'a'))
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]
This is a much more general solution than the other answers posted here:
Empty iterables are correctly handled.
There can be multiple iterable and/or scalar arguments.
The order of the scalar/iterable arguments doesn't need to be known.
If there are multiple iterable arguments, you can check that they are the same length with strict=True.
You can easily control whether or not strings should be treated as iterables (by default they are not).
Just define a class with infinite iterator which is initialized with the single element you want to injected in the lists:
class zipIterator:
def __init__(self, val):
self.__val = val
def __iter__(self):
return self
def __next__(self):
return self.__val
and then create your new list from this class and the lists you have:
elements = [1, 2, 3, 4]
key = 'a'
res = [it for it in zip(elements, zipIterator(key))]
the result would be:
>>res
[(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a')]

Transpose/Unzip Function (inverse of zip)?

I have a list of 2-item tuples and I'd like to convert them to 2 lists where the first contains the first item in each tuple and the second list holds the second item.
For example:
original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# and I want to become...
result = (['a', 'b', 'c', 'd'], [1, 2, 3, 4])
Is there a builtin function that does that?
In 2.x, zip is its own inverse! Provided you use the special * operator.
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
This is equivalent to calling zip with each element of the list as a separate argument:
zip(('a', 1), ('b', 2), ('c', 3), ('d', 4))
except the arguments are passed to zip directly (after being converted to a tuple), so there's no need to worry about the number of arguments getting too big.
In 3.x, zip returns a lazy iterator, but this is trivially converted:
>>> list(zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)]))
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
You could also do
result = ([ a for a,b in original ], [ b for a,b in original ])
It should scale better. Especially if Python makes good on not expanding the list comprehensions unless needed.
(Incidentally, it makes a 2-tuple (pair) of lists, rather than a list of tuples, like zip does.)
If generators instead of actual lists are ok, this would do that:
result = (( a for a,b in original ), ( b for a,b in original ))
The generators don't munch through the list until you ask for each element, but on the other hand, they do keep references to the original list.
I like to use zip(*iterable) (which is the piece of code you're looking for) in my programs as so:
def unzip(iterable):
return zip(*iterable)
I find unzip more readable.
If you have lists that are not the same length, you may not want to use zip as per Patricks answer. This works:
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
But with different length lists, zip truncates each item to the length of the shortest list:
>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e')]
You can use map with no function to fill empty results with None:
>>> map(None, *[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e'), (1, 2, 3, 4, None)]
zip() is marginally faster though.
To get a tuple of lists, as in the question:
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple([list(tup) for tup in zip(*original)])
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])
To unpack the two lists into separate variables:
list1, list2 = [list(tup) for tup in zip(*original)]
Naive approach
def transpose_finite_iterable(iterable):
return zip(*iterable) # `itertools.izip` for Python 2 users
works fine for finite iterable (e.g. sequences like list/tuple/str) of (potentially infinite) iterables which can be illustrated like
| |a_00| |a_10| ... |a_n0| |
| |a_01| |a_11| ... |a_n1| |
| |... | |... | ... |... | |
| |a_0i| |a_1i| ... |a_ni| |
| |... | |... | ... |... | |
where
n in ℕ,
a_ij corresponds to j-th element of i-th iterable,
and after applying transpose_finite_iterable we get
| |a_00| |a_01| ... |a_0i| ... |
| |a_10| |a_11| ... |a_1i| ... |
| |... | |... | ... |... | ... |
| |a_n0| |a_n1| ... |a_ni| ... |
Python example of such case where a_ij == j, n == 2
>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterable(iterable)
>>> next(result)
(0, 0)
>>> next(result)
(1, 1)
But we can't use transpose_finite_iterable again to return to structure of original iterable because result is an infinite iterable of finite iterables (tuples in our case):
>>> transpose_finite_iterable(result)
... hangs ...
Traceback (most recent call last):
File "...", line 1, in ...
File "...", line 2, in transpose_finite_iterable
MemoryError
So how can we deal with this case?
... and here comes the deque
After we take a look at docs of itertools.tee function, there is Python recipe that with some modification can help in our case
def transpose_finite_iterables(iterable):
iterator = iter(iterable)
try:
first_elements = next(iterator)
except StopIteration:
return ()
queues = [deque([element])
for element in first_elements]
def coordinate(queue):
while True:
if not queue:
try:
elements = next(iterator)
except StopIteration:
return
for sub_queue, element in zip(queues, elements):
sub_queue.append(element)
yield queue.popleft()
return tuple(map(coordinate, queues))
let's check
>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterables(transpose_finite_iterable(iterable))
>>> result
(<generator object transpose_finite_iterables.<locals>.coordinate at ...>, <generator object transpose_finite_iterables.<locals>.coordinate at ...>)
>>> next(result[0])
0
>>> next(result[0])
1
Synthesis
Now we can define general function for working with iterables of iterables ones of which are finite and another ones are potentially infinite using functools.singledispatch decorator like
from collections import (abc,
deque)
from functools import singledispatch
#singledispatch
def transpose(object_):
"""
Transposes given object.
"""
raise TypeError('Unsupported object type: {type}.'
.format(type=type))
#transpose.register(abc.Iterable)
def transpose_finite_iterables(object_):
"""
Transposes given iterable of finite iterables.
"""
iterator = iter(object_)
try:
first_elements = next(iterator)
except StopIteration:
return ()
queues = [deque([element])
for element in first_elements]
def coordinate(queue):
while True:
if not queue:
try:
elements = next(iterator)
except StopIteration:
return
for sub_queue, element in zip(queues, elements):
sub_queue.append(element)
yield queue.popleft()
return tuple(map(coordinate, queues))
def transpose_finite_iterable(object_):
"""
Transposes given finite iterable of iterables.
"""
yield from zip(*object_)
try:
transpose.register(abc.Collection, transpose_finite_iterable)
except AttributeError:
# Python3.5-
transpose.register(abc.Mapping, transpose_finite_iterable)
transpose.register(abc.Sequence, transpose_finite_iterable)
transpose.register(abc.Set, transpose_finite_iterable)
which can be considered as its own inverse (mathematicians call this kind of functions "involutions") in class of binary operators over finite non-empty iterables.
As a bonus of singledispatching we can handle numpy arrays like
import numpy as np
...
transpose.register(np.ndarray, np.transpose)
and then use it like
>>> array = np.arange(4).reshape((2,2))
>>> array
array([[0, 1],
[2, 3]])
>>> transpose(array)
array([[0, 2],
[1, 3]])
Note
Since transpose returns iterators and if someone wants to have a tuple of lists like in OP -- this can be made additionally with map built-in function like
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple(map(list, transpose(original)))
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])
Advertisement
I've added generalized solution to lz package from 0.5.0 version which can be used like
>>> from lz.transposition import transpose
>>> list(map(tuple, transpose(zip(range(10), range(10, 20)))))
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)]
P.S.
There is no solution (at least obvious) for handling potentially infinite iterable of potentially infinite iterables, but this case is less common though.
It's only another way to do it but it helped me a lot so I write it here:
Having this data structure:
X=[1,2,3,4]
Y=['a','b','c','d']
XY=zip(X,Y)
Resulting in:
In: XY
Out: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]
The more pythonic way to unzip it and go back to the original is this one in my opinion:
x,y=zip(*XY)
But this return a tuple so if you need a list you can use:
x,y=(list(x),list(y))
Consider using more_itertools.unzip:
>>> from more_itertools import unzip
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> [list(x) for x in unzip(original)]
[['a', 'b', 'c', 'd'], [1, 2, 3, 4]]
None of the previous answers efficiently provide the required output, which is a tuple of lists, rather than a list of tuples. For the former, you can use tuple with map. Here's the difference:
res1 = list(zip(*original)) # [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
res2 = tuple(map(list, zip(*original))) # (['a', 'b', 'c', 'd'], [1, 2, 3, 4])
In addition, most of the previous solutions assume Python 2.7, where zip returns a list rather than an iterator.
For Python 3.x, you will need to pass the result to a function such as list or tuple to exhaust the iterator. For memory-efficient iterators, you can omit the outer list and tuple calls for the respective solutions.
While numpy arrays and pandas may be preferrable, this function imitates the behavior of zip(*args) when called as unzip(args).
Allows for generators, like the result from zip in Python 3, to be passed as args as it iterates through values.
def unzip(items, cls=list, ocls=tuple):
"""Zip function in reverse.
:param items: Zipped-like iterable.
:type items: iterable
:param cls: Container factory. Callable that returns iterable containers,
with a callable append attribute, to store the unzipped items. Defaults
to ``list``.
:type cls: callable, optional
:param ocls: Outer container factory. Callable that returns iterable
containers. with a callable append attribute, to store the inner
containers (see ``cls``). Defaults to ``tuple``.
:type ocls: callable, optional
:returns: Unzipped items in instances returned from ``cls``, in an instance
returned from ``ocls``.
"""
# iter() will return the same iterator passed to it whenever possible.
items = iter(items)
try:
i = next(items)
except StopIteration:
return ocls()
unzipped = ocls(cls([v]) for v in i)
for i in items:
for c, v in zip(unzipped, i):
c.append(v)
return unzipped
To use list cointainers, simply run unzip(zipped), as
unzip(zip(["a","b","c"],[1,2,3])) == (["a","b","c"],[1,2,3])
To use deques, or other any container sporting append, pass a factory function.
from collections import deque
unzip([("a",1),("b",2)], deque, list) == [deque(["a","b"]),deque([1,2])]
(Decorate cls and/or main_cls to micro manage container initialization, as briefly shown in the final assert statement above.)
Since it returns tuples (and can use tons of memory), the zip(*zipped) trick seems more clever than useful, to me.
Here's a function that will actually give you the inverse of zip.
def unzip(zipped):
"""Inverse of built-in zip function.
Args:
zipped: a list of tuples
Returns:
a tuple of lists
Example:
a = [1, 2, 3]
b = [4, 5, 6]
zipped = list(zip(a, b))
assert zipped == [(1, 4), (2, 5), (3, 6)]
unzipped = unzip(zipped)
assert unzipped == ([1, 2, 3], [4, 5, 6])
"""
unzipped = ()
if len(zipped) == 0:
return unzipped
dim = len(zipped[0])
for i in range(dim):
unzipped = unzipped + ([tup[i] for tup in zipped], )
return unzipped
While zip(*seq) is very useful, it may be unsuitable for very long sequences as it will create a tuple of values to be passed in. For example, I've been working with a coordinate system with over a million entries and find it signifcantly faster to create the sequences directly.
A generic approach would be something like this:
from collections import deque
seq = ((a1, b1, …), (a2, b2, …), …)
width = len(seq[0])
output = [deque(len(seq))] * width # preallocate memory
for element in seq:
for s, item in zip(output, element):
s.append(item)
But, depending on what you want to do with the result, the choice of collection can make a big difference. In my actual use case, using sets and no internal loop, is noticeably faster than all other approaches.
And, as others have noted, if you are doing this with datasets, it might make sense to use Numpy or Pandas collections instead.
Just to summarize:
# data
a = ('a', 'b', 'c', 'd')
b = (1, 2, 3, 4)
# forward
zipped = zip(a, b) # [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# reverse
a_, b_ = zip(*zipped)
# verify
assert a == a_
assert b == b_
Here's a simple one-line answer that produces the desired output:
original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
list(zip(*original))
# [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]

Categories

Resources