Unnest/explode a list/tuple containing uneven sublists

Unnest/explode a list/tuple containing uneven sublists - python

I have a list/tuple containing uneven sublists (query result from SQLAlchemy), which looks like this:
x = [('x', [1,2], [3,4])]
I want to unnest/explode x as the following:
x = [('x',[1],[3]),('x',[2],[4])]
Or
x = [('x',1,3),('x',2,4)]
I can achieve this using pandas dataframes with the following:
df = pd.DataFrame(x, columns=['X','A','B'])
df = df.apply(lambda x: x.explode() if x.name in ['A', 'B'] else x)
print([tuple(i) for i in df.values.tolist()])
which generates the following output:
[('x', 1, 3), ('x', 2, 4)]
However, I would love to know if there is any pure python only solution possible. I have been playing around with list comprehension based on following answer with no luck.
[item for sublist in x for item in sublist]
Any help would be appreciated.
Edit:
my input looks like this:
[(u'x.x#gmail.com', u'Contact Info Refused/Not provided was Documented', 0L,
[None, None, None], [1447748, 1447751, 1447750], 3L, [1491930], 'nce', 1, 2037)]
Expected output:
Just unpacking two sublist and keep everything the same.
[(u'x.x#gmail.com',u'Contact Info Refused/Not provided was Documented','0L',None,1447748,3L,[1491930],'nce',1,2037),
,(u'x.x#gmail.com',u'Contact Info Refused/Not provided was Documented','0L',None,1447751,3L,[1491930],'nce',1,2037),
(u'x.x#gmail.com',u'Contact Info Refused/Not provided was Documented','0L',None,1447750,3L,[1491930],'nce',1,2037)) ]

From itertools
import itertools
list(itertools.zip_longest(*x[0],fillvalue=x[0][0]))
Out[25]: [('x', 1, 3), ('x', 2, 4)]
# [list(itertools.zip_longest(*x[0],fillvalue=x[0][0])) for x in sublist]

Related

Sort a list alphabetically and retrieve initial index in python

I've been trying to sort a list of names alphabetically, let's say:
list=['Bob','Alice','Charlie']
print(list.index('Alice'))
1
However I'd also like to keep track of the original indexes, so this won't work:
list.sort()
print(list)
['Alice','Bob','Charlie']
print(list.index('Alice'))
0
After sorting the indexes changed; is there any way of keeping track of the original indexes? I've checked other similar questions and numpy has a solution, but not useful for str variables.

Just sort the reversed (index, name) tuples from enumerate to keep track of the elements and their indices:
>>> names = ['Bob','Alice','Charlie']
>>> sorted((name, index) for index, name in enumerate(names))
[('Alice', 1), ('Bob', 0), ('Charlie', 2)]

l = ['Bob','Alice','Charlie']
e = enumerate(l) # creates a generator of [(0, 'Bob'), (1, 'Alice'), (2, 'Charlie')]
sl = sorted(e, key=lambda x: x[1]) # [(1, 'Alice'), (0, 'Bob'), (2, 'Charlie')]

You may create another list of indices and sort that one, leaving the original untouched:
>>> a = ['Bob', 'Alice', 'Charlie']
>>> idx = range(len(a))
>>> idx
[0, 1, 2]
>>> sorted( idx, key=lambda x : a[x] )
[1, 0, 2]
>>>

You could create a nested dictionary of sorts to hold the original index and sorted value.
First I would recommend to use a proper name for your list object, list is a keyword in python.
names=['Bob','Alice','Charlie']
name_dict = {name : {'unsorted' : idx} for idx,name in enumerate(names)}
for sorted_idx, name in enumerate(sorted(names)):
name_dict[name].update({'sorted' : sorted_idx})
print(name_dict['Bob']['sorted'])
1
print(name_dict['Bob']['unsorted'])
0
print(name_dict)
{'Bob': {'unsorted': 0, 'sorted': 1},
'Alice': {'unsorted': 1, 'sorted': 0},
'Charlie': {'unsorted': 2, 'sorted': 2}}

Sort a list alphabetically and retrieve initial index elemnt in python
l=['Bob','Alice','Charlie']
def sort_and_get_first_element(list1):
list1.sort()
return list1[0]
print sort_and_get_first_element(l)

Yes, you can keep track of initial index but with different data structure
a = ['Bob','Alice','Charlie']
l = sorted(enumerate(a), key=lambda i: i[1])
print(l)
Now the sorted list that keep track of initial index is,
[(1, 'Alice'), (0, 'Bob'), (2, 'Charlie')]

Set Ascending Descending for each column when sorting by multiple columns [duplicate]

This question already has answers here:
Sort by multiple keys using different orderings [duplicate]
(3 answers)
Closed 4 years ago.
I have a list d that I wish to sort. I sort by the first column first. If its a tie there I then go on to use the second column to sort. Say I want to sort by the first column in ascending order but sort by the second column in descending order. Ascending being the default, using the reverse key I thought the below should work.
sorted(d,key=lambda x: (x[0],x[1]),reverse=(False,True))
But it does not. It give the following error.
reverse=(False,True))
TypeError: an integer is required (got type tuple)
So if I'm not doing it right how to fix it? Or the way to do this is completely different? Advice on that would be helpful.
My question indeed has some duplication but there are already interesting responses, so I would like to keep it.

(Might be overkill, but..) Using pandas and args ascending=[True, False]:
d = [[1,2], [2,2], [2,3], [2,4], [3,1]]
df= pd.DataFrame(d)
sorted_values = df.sort_values(by=[0,1], ascending=[True,False])
sorted_list = sorted_values.agg(list,1).tolist()
[[1, 2], [2, 4], [2, 3], [2, 2], [3, 1]]

From the docs:
reverse is a boolean value. If set to True, then the list elements are sorted as if each comparison were reversed.
So what you want instead is something like:
d.sort(key=lambda x: (x[0], -x[1]))
If x[1] is not a number, try:
d.sort(key=lambda x: x[1], reverse=True)
d.sort(key=lambda x: x[0])

My take on solution:
from itertools import groupby, chain
l = ((1, 'b'),
(1, 'd'),
(2, 'a'),
(1, 'a'))
def sort_multi(lst, index_normal, index_reversed):
return list(chain.from_iterable([sorted(list(j), key=lambda v:v[index_reversed], reverse=True) for i, j in groupby(sorted(lst), key=lambda v:v[index_normal])]))
print(sort_multi(l, 0, 1))
Outputs:
[(1, 'd'), (1, 'b'), (1, 'a'), (2, 'a')]

Maintaining the order of the elements in a frozen set

I have a list of tuples, each tuple of which contains one string and two integers. The list looks like this:
x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
The list contains thousands of such tuples. Now if I want to get unique combinations, I can do the frozenset on my list as follows:
y = set(map(frozenset, x))
This gives me the following result:
{frozenset({'a', 2, 1}), frozenset({'x', 5, 6}), frozenset({3, 'b', 4})}
I know that set is an unordered data structure and this is normal case but I want to preserve the order of the elements here so that I can thereafter insert the elements in a pandas dataframe. The dataframe will look like this:
Name Marks1 Marks2
0 a 1 2
1 b 3 4
2 x 5 6

Instead of operating on the set of frozensets directly you could use that only as a helper data-structure - like in the unique_everseen recipe in the itertools section (copied verbatim):
from itertools import filterfalse
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in filterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
Basically this would solve the issue when you use key=frozenset:
>>> x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
>>> list(unique_everseen(x, key=frozenset))
[('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
This returns the elements as-is and it also maintains the relative order between the elements.

No ordering with frozensets. You can instead create sorted tuples to check for the existence of an item, adding the original if the tuple does not exist in the set:
y = set()
lst = []
for i in x:
t = tuple(sorted(i, key=str)
if t not in y:
y.add(t)
lst.append(i)
print(lst)
# [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]
The first entry gets preserved.

There are some quite useful functions in NumPy which can help you to solve this problem.
import numpy as np
chrs, indices = np.unique(list(map(lambda x:x[0], x)), return_index=True)
chrs, indices
>> (array(['a', 'b', 'x'],
dtype='<U1'), array([0, 1, 2]))
[x[indices[i]] for i in range(indices.size)]
>> [('a', 1, 2), ('b', 3, 4), ('x', 5, 6)]

You can do it by simple using the zip to maintain the order in the frozenset.
Give this a try pls.
l = ['col1','col2','col3','col4']
>>> frozenset(l)
--> frozenset({'col2', 'col4', 'col3', 'col1'})
>>> frozenset(zip(*zip(l)))
--> frozenset({('col1', 'col2', 'col3', 'col4')})
Taking an example from the question asked:
>>> x = [('a',1,2), ('b',3,4), ('x',5,6), ('a',2,1)]
>>> frozenset(zip(*zip(x)))
--> frozenset({(('a', 1, 2), ('b', 3, 4), ('x', 5, 6), ('a', 2, 1))})

Getting the first item for a tuple for eaching a row in a list in pyspark

I'm a bit new to Spark and I am trying to do a simple mapping.
My data is like the following:
RDD((0, list(tuples)), ..., (19, list(tuples))
What I want to do is grabbing the first item in each tuple, so ultimately something like this:
RDD((0, list(first item of each tuple),..., (19, list(first item of each tuple))
Can someone help me out with how to map this?
I'll appreciate that!

You can use mapValues to convert the list of tuples to a list of tuple[0]:
rdd.mapValues(lambda x: [t[0] for t in x])

Something like this?
kv here meaning "key-value" and mapping itemgetter over the values. So, map within a map :-)
from operator import itemgetter
rdd = sc.parallelize([(0, [(0,'a'), (1,'b'), (2,'c')]), (1, [(3,'x'), (5,'y'), (6,'z')])])
mapped = rdd.mapValues(lambda v: map(itemgetter(0), v))
Output
mapped.collect()
[(0, [0, 1, 2]), (1, [3, 5, 6])]

compare to lists and return the different indices and elements in python

I want to compare to lists and return the different indices and elements.
So I write the following code:
l1 = [1,1,1,1,1]
l2 = [1,2,1,1,3]
ind = []
diff = []
for i in range(len(l1)):
if l1[i] != l2[i]:
ind.append(i)
diff.append([l1[i], l2[i]])
print ind
print diff
# output:
# [1, 4]
# [[1, 2], [1, 3]]
The code works, but are there any better ways to do that?
Update the Question:
I want to ask for another solutions, for example with the iterator, or ternary expression like [a,b](expression) (Not the easiest way like what I did. I want to exclude it.) Thanks very much for the patient! :)

You could use a list comprehension to output all the information in a single list.
>>> [[idx, (i,j)] for idx, (i,j) in enumerate(zip(l1, l2)) if i != j]
[[1, (1, 2)], [4, (1, 3)]]
This will produce a list where each element is: [index, (first value, second value)] so all the information regarding a single difference is together.

An alternative way is the following
>>> l1 = [1,1,1,1,1]
>>> l2 = [1,2,1,1,3]
>>> z = zip(l1,l2)
>>> ind = [i for i, x in enumerate(z) if x[0] != x[1]]
>>> ind
[1, 4]
>>> diff = [z[i] for i in ind]
>>> diff
[(1, 2), (1, 3)]
In Python3 you have to add a call to list around zip.

You can try functional style:
res = filter(lambda (idx, x): x[0] != x[1], enumerate(zip(l1, l2)))
# [(1, (1, 2)), (4, (1, 3))]
to unzip res you can use:
zip(*res)
# [(1, 4), ((1, 2), (1, 3))]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unnest/explode a list/tuple containing uneven sublists - python

From itertools import itertools list(itertools.zip_longest(x[0],fillvalue=x[0][0])) Out[25]: [('x', 1, 3), ('x', 2, 4)] # [list(itertools.zip_longest(x[0],fillvalue=x[0][0])) for x in sublist]

Related

Sort a list alphabetically and retrieve initial index in python

Set Ascending Descending for each column when sorting by multiple columns [duplicate]

Maintaining the order of the elements in a frozen set

Getting the first item for a tuple for eaching a row in a list in pyspark

compare to lists and return the different indices and elements in python

Categories

Resources

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Unnest/explode a list/tuple containing uneven sublists - python

From itertools import itertools list(itertools.zip_longest(*x[0],fillvalue=x[0][0])) Out[25]: [('x', 1, 3), ('x', 2, 4)] # [list(itertools.zip_longest(*x[0],fillvalue=x[0][0])) for x in sublist]

Related

Sort a list alphabetically and retrieve initial index in python

Set Ascending Descending for each column when sorting by multiple columns [duplicate]

Maintaining the order of the elements in a frozen set

Getting the first item for a tuple for eaching a row in a list in pyspark

compare to lists and return the different indices and elements in python

Categories

Resources

From itertools import itertools list(itertools.zip_longest(x[0],fillvalue=x[0][0])) Out[25]: [('x', 1, 3), ('x', 2, 4)] # [list(itertools.zip_longest(x[0],fillvalue=x[0][0])) for x in sublist]