Grouping two lists in python

Grouping two lists in python - python

I have two lists which I want to group on the basis of the first element of the lists.
list1 = [['1','abc','zef'],['2','qwerty','opo'],['3','lol','pop']]
list2 = [['1','rofl','pole'],['2','sole','pop'],['3','lmao','wtf']]
Here the first elements in the list inside the list are '1' , '2' and '3'.
I want my final list to be like :-
Final_List = [['1', 'abc', 'zef', 'rofl', 'pole'], ['3', 'lol', 'pop', 'lmao', 'wtf'], ['2', 'qwerty', 'opo', 'sole', 'pop']]
I have tried this using below code.
#!/usr/bin/python
list1 = [['1','abc','zef'],['2','qwerty','opo'],['3','lol','pop']]
list2 = [['1','rofl','pole'],['2','sole','pop'],['3','lmao','wtf']]
d = {}
for i in list1:
d[i[0]] = i[1:]
for i in list2:
d[i[0]].extend(i[1:])
Final_List = []
for key, value in d.iteritems():
value.insert(0,key)
Final_List.append(value)
This code works but i was wondering if there was an easy and cleaner way to do it
Any help?

I would have written like you have written with a little modification, like this
Prepare a dictionary with all the elements from the second position gathered corresponding to the first element.
d = {}
for items in (list1, list2):
for item in items:
d.setdefault(item[0], [item[0]]).extend(item[1:])
And then just get all the values from the dictionary (Thanks #jamylak) :-)
print(d.values())
Output
[['3', 'lol', 'pop', 'lmao', 'wtf'],
['1', 'abc', 'zef', 'rofl', 'pole'],
['2', 'qwerty', 'opo', 'sole', 'pop']]

If item sequence in the lists inside of the Final_List is not important then this can be used,
[list(set(sum(itm, []))) for itm in zip(list1, list2)]

Your code seems correct. Just modify the following portion:
Final_List = []
for key in d:
L = [key] + [x for x in d[key]]
Final_List.append(L)

Yes, with list comprehension and enumerate
list1 = [['1','abc','zef'],['2','qwerty','opo'],['3','lol','pop']]
list2 = [['1','rofl','pole'],['2','sole','pop'],['3','lmao','wtf']]
print [set(v + list2[k]) for k,v in enumerate(list1)]
[['1', 'abc', 'zef', 'rofl', 'pole'], ['2', 'qwerty', 'opo', 'sole', 'pop'], ['3', 'lol', 'pop', 'lmao', 'wtf']]
EDIT
With index relation
list1 = [['1','abc','zef'],['2','qwerty','opo'],['3','lol','pop']]
list2 = [['1','rofl','pole'],['3','lmao','wtf'],['2','sole','pop']]
d1 = {a[0]:a for a in list1}
d2 = {a[0]:a for a in list2}
print [set(v + d2[k]) for k, v in d1.items()]

Using default dict and list comprehensions you can shorten your code
from collections import defaultdict
list1 = [['1','abc','zef'],['2','qwerty','opo'],['3','lol','pop']]
list2 = [['1','rofl','pole'],['2','sole','pop'],['3','lmao','wtf']]
d = defaultdict(list)
for i in list1 + list2:
d[i[0]].extend(i[1:])
Final_List = [[key] + value for key, value in d.iteritems()]
print Final_List

list3 = []
for i in xrange(0,max(len(list1[0]), len(list2[0]))):
list3.append(list(list1[i]))
list3[i].extend(x for x in list2[i] if x not in list3[i])
with a xrange, you can iterate only once through the list.

A bit of functional style:
import operator, itertools
from pprint import pprint
one = [['1','abc','zef'],['2','qwerty','opo'],['3','lol','pop']]
two = [['1','rofl','pole'],['2','sole','pop'],['3','lmao','wtf']]
A few helpers:
zero = operator.itemgetter(0)
all_but_the_first = operator.itemgetter(slice(1, None))
data = (one, two)
def foo(group):
# group is (key, iterator) from itertools.groupby
key = group[0]
lists = group[1]
result = list(key)
for item in lists:
result.extend(all_but_the_first(item))
return result
Function to process the daa
def process(data, func = foo):
# concatenate all the sublists
new = itertools.chain(*data)
# group by item zero
three = sorted(new, key = zero)
groups = itertools.groupby(three, zero)
# iterator that builds the new lists
return itertools.imap(foo, groups)
Usage
>>> pprint(list(process(data)))
[['1', 'abc', 'zef', 'rofl', 'pole'],
['2', 'qwerty', 'opo', 'sole', 'pop'],
['3', 'lol', 'pop', 'lmao', 'wtf']]
>>>
>>> for thing in process(data):
print thing
['1', 'abc', 'zef', 'rofl', 'pole']
['2', 'qwerty', 'opo', 'sole', 'pop']
['3', 'lol', 'pop', 'lmao', 'wtf']
>>>

list1 = [['1','abc','zef'],['2','qwerty','opo'],['3','lol','pop']]
list2 = [['1','rofl','pole'],['2','sole','pop'],['3','lmao','wtf']]
Final_List = []
for i in range(0, len(list1)):
Final_List.append(list1[i] + list2[i])
del Final_List[i][3]
print Final_List
Output
[['1', 'abc', 'zef', 'rofl', 'pole'], ['2', 'qwerty', 'opo', 'sole', 'pop'], ['3', 'lol', 'pop', 'lmao', 'wtf']]

Related

Split list in python when same values occurs into a list of sublists

Using python, I need to split my_list = ['1','2','2','3','3','3','4','4','5'] into a list with sublists that avoid the same value. Correct output = [['1','2','3','4','5'],['2','3','4'],['3']]

Probably not the most efficient approach but effective nonetheless:
my_list = ['1','2','2','3','3','3','4','4','5']
output = []
for e in my_list:
for f in output:
if not e in f:
f.append(e)
break
else:
output.append([e])
print(output)
Output:
[['1', '2', '3', '4', '5'], ['2', '3', '4'], ['3']]

I assumed you are indexing every unique element with its occurrence and also sorted the result list to better suit your desired output.
uniques = list(set(my_list))
uniques.sort()
unique_counts = {unique:my_list.count(unique) for unique in uniques}
new_list = []
for _ in range(max(unique_counts.values())):
new_list.append([])
for unique,count in unique_counts.items():
for i in range(count):
new_list[i].append(unique)
The output for new_list is
[['1', '2', '3', '4', '5'], ['2', '3', '4'], ['3']]

By using collections.Counter for recognizing the maximum number of the needed sublists and then distributing consecutive unique keys on sublists according to their frequencies:
from collections import Counter
my_list = ['1','2','2','3','3','3','4','4','5']
cnts = Counter(my_list)
res = [[] for i in range(cnts.most_common(1).pop()[1])]
for k in cnts.keys():
for j in range(cnts[k]):
res[j].append(k)
print(res)
[['1', '2', '3', '4', '5'], ['2', '3', '4'], ['3']]

Here's a way to do it based on getting unique values and counts using list comprehension.
my_list = ['1','2','2','3','3','3','4','4','5']
unique = [val for i,val in enumerate(my_list) if val not in my_list[0:i]]
counts = [my_list.count(val) for val in unique]
output = [[val for val,ct in zip(unique, counts) if ct > i] for i in range(max(counts))]

match line number of a 1d array with first element of a 2d array for large dataset in Python

I've two large lists - here I will show just an example to simplify.
In list1 I've some words,
list1 = ['hello','stack','overflow']
in list2 I've the line number of the words of list1, and a numerical value to identify the type of word.
list2= [['0','10'],['2', '11'],['4', '12']]
I would like to use the line number of list2
list2 = [['0','10'],['2', '11'],['4', '12']] #line numbers here are: 0,2,4
with the corresponding line in list1,
list1 = ['hello','stack','overflow'] #correspondences found here are: hello (for list2[0]) and overflow (for list2[1])
so that I can have a list3 with word and its tag.
list3 = [['hello','10'], ['overflow', '11']]
I found a way to cross both lists but it's very slow and I think not efficient at all. How might I simplify this lookup process?
list1 = ['hello','stack','overflow']
list2= [['0','10'],['2', '11'],['4', '12']]
for i in range(0, len(list1)):
for k in range(0, len(list2)):
if (str(list2[k][0]) == str(i)):
print("Found "+str(list1[i]))
Found hello
Found overflow

IIUC, you could do:
list1 = ['hello', 'stack', 'overflow']
list2 = [['0', '10'], ['2', '11'], ['4', '12']]
# transform the line numbers to ints
line_numbers = [(int(l), e) for l, e in list2]
# filter and compound with other number
res = [[list1[ln], other] for ln, other in line_numbers if ln < len(list1)]
print(res)
Output
[['hello', '10'], ['overflow', '11']]

How to compare each element of multiple lists and return name of lists which are different

I am having multiple lists and I need to compare each list with one another and return the name of lists which are different. We need to consider value of elements in list irrespective of their position while comparing lists.
For example:-
Lis1=['1','2','3']
Lis2=['1','2']
Lis3=['0','1','3']
Lis4=[]
Lis5=['1','2']
Output:-
['Lis1','Lis2','Lis3','Lis4']
Thanks in advance.

Try this:
input_lists = {"Lis1": ['1', '2', '3'], "Lis2": ['1', '2'],
"Lis3": ['0', '1', '3'], "Lis4": [], "Lis5": ['1', '2']}
output_lists = {}
for k, v in input_lists.items():
if sorted(v) not in output_lists.values():
output_lists[k] = sorted(v)
unique_keys = list(output_lists.keys())
print(unique_keys) # ['Lis1', 'Lis2', 'Lis3', 'Lis4']

import itertools
Lis1=['1','2','3']
Lis2=['1','2']
Lis3=['0','1','3']
Lis4=[]
Lis5=['1','2']
k=[Lis1,Lis2,Lis3,Lis4,Lis5]
k.sort()
list(k for k,_ in itertools.groupby(k))
output
[[], ['0', '1', '3'], ['1', '2'], ['1', '2', '3']]

a simple way to implement
Lis1=['1','2','3']
Lis2=['1','2']
Lis3=['0','1','3']
Lis4=[]
Lis5=['1','2']
lis=[Lis1,Lis2,Lis3,Lis4,Lis5]
final=[]
for ele in lis:
if(ele not in final):
final.append(ele)
print(final)

with your given data you can use:
Lis1=['1','2','3']
Lis2=['1','2']
Lis3=['0','1','3']
Lis4=[]
Lis5=['1','2']
name_lis = {'Lis1': Lis1, 'Lis2': Lis2, 'Lis3': Lis3, 'Lis4': Lis4, 'Lis5': Lis5}
tmp = set()
response = []
for k, v in name_lis.items():
s = ''.join(sorted(v))
if s not in tmp:
tmp.add(s)
response.append(k)
print(response)
output:
['Lis1', 'Lis2', 'Lis3', 'Lis4']
name_lis dictionary contains the name of your list and the actual list, you are iterating over each list, and for each list, you are sorting the elements and then converting in a string, if the string was encountered before you know that the list is a duplicate if not you are adding the list to the response

How to erase duplicated element in row of a list in python

I don't know exactly how to explain this for the title, so here some code to express what I need. I have a list like this:
lst = [['24', 'john', 'july, 'email#gmail.com],
['12', 'alice', 'auguts, 'email#hotmail.com],
['48', 'john', 'september, 'email#outlook.com],
[ ....]]
I want to erase all duplicated sublists with the same name (name being the second field in each sublist), in this case I want the final list to be:
lst = [['24', 'john', 'july, 'email#gmail.com],
['12', 'alice', 'auguts, 'email#hotmail.com]
[ ....]]
I don't want to find a duplicated list and erase it, I want to erase a list which has a duplicated field. Sorry if I didn't explain myself well.
Thanks!

Use set to check duplicates.
>>> lst = [
... ['24', 'john', 'july', 'email#gmail.com'],
... ['12', 'alice', 'auguts', 'email#hotmail.com'],
... ['48', 'john', 'september', 'email#outlook.com'],
... ]
>>>
>>> seen = set()
>>> result = []
>>> for item in lst:
... name = item[1]
... if name not in seen:
... seen.add(name)
... result.append(item)
...
>>> result
[['24', 'john', 'july', 'email#gmail.com'],
['12', 'alice', 'auguts', 'email#hotmail.com']]
Don't use list as a variable name. It shadows builtin list.
>>> seen = set()
>>> [x for x in lst if (x[1] not in seen, seen.add(x[1]))[0]]
[['24', 'john', 'july', 'email#gmail.com'],
['12', 'alice', 'auguts', 'email#hotmail.com']]

Using filter:
lst = [['24', 'john', 'july', 'email#gmail.com'],
['12', 'alice', 'auguts', 'email#hotmail.com'],
['48', 'john', 'september', 'email#outlook.com']
]
seen = {}
def filter_condition(item):
if item[1] in seen: return False
seen[item[1]] = 1
return True
print filter(filter_condition, lst)

Here's a naive approach, renaming your starting list to oldlist to avoid a naming problem with the builtin Python list.
newlist = []
for j, sublist in enumerate(oldlist):
unique = True
for laterlist in oldlist[j+1:]:
if any([sublist[k] == laterlist[k] for k in range(len(sublist))]):
unique = False
if unique:
newlist.append(sublist)

Grouping lists within lists in Python 3

I have a list of lists of strings like so:
List1 = [
['John', 'Doe'],
['1','2','3'],
['Henry', 'Doe'],
['4','5','6']
]
That I would like to turn into something like this:
List1 = [
[ ['John', 'Doe'], ['1','2','3'] ],
[ ['Henry', 'Doe'], ['4','5','6'] ]
]
But I seem to be having trouble doing so.

List1 = [['John', 'Doe'], ['1','2','3'],
['Henry', 'Doe'], ['4','5','6'],
['Bob', 'Opoto'], ['10','11','12']]
def pairing(iterable):
it = iter(iterable)
itn = it.next
for x in it :
yield (x,itn())
# The generator pairing(iterable) yields tuples:
for tu in pairing(List1):
print tu
# produces:
(['John', 'Doe'], ['1', '2', '3'])
(['Henry', 'Doe'], ['4', '5', '6'])
(['Bob', 'Opoto'], ['8', '9', '10'])
# If you really want a yielding of lists:
from itertools import imap
# In Python 2. In Python 3, map is a generator
for li in imap(list,pairing(List1)):
print li
# or defining pairing() precisely so:
def pairing(iterable):
it = iter(iterable)
itn = it.next
for x in it :
yield [x,itn()]
# produce
[['John', 'Doe'], ['1', '2', '3']]
[['Henry', 'Doe'], ['4', '5', '6']]
[['Bob', 'Opoto'], ['8', '9', '10']]
Edit: Defining a generator function isn't required, you can do the pairing of a list on the fly:
List1 = [['John', 'Doe'], ['1','2','3'],
['Henry', 'Doe'], ['4','5','6'],
['Bob', 'Opoto'], ['8','9','10']]
it = iter(List1)
itn = it.next
List1 = [ [x,itn()] for x in it]

This should do what you want assuming you always want to take pairs of the inner lists together.
list1 = [['John', 'Doe'], ['1','2','3'], ['Henry', 'Doe'], ['4','5','6']]
output = [list(pair) for pair in zip(list1[::2], list1[1::2])]
It uses zip, which gives you tuples, but if you need it exactly as you've shown, in lists, the outer list comprehension does that.

Here it is in 8 lines. I used tuples rather than lists because it's the "correct" thing to do:
def pairUp(iterable):
"""
[1,2,3,4,5,6] -> [(1,2),(3,4),(5,6)]
"""
sequence = iter(iterable)
for a in sequence:
try:
b = next(sequence)
except StopIteration:
raise Exception('tried to pair-up %s, but has odd number of items' % str(iterable))
yield (a,b)
Demo:
>>> list(pairUp(range(0)))
[]
>>> list(pairUp(range(1)))
Exception: tried to pair-up [0], but has odd number of items
>>> list(pairUp(range(2)))
[(0, 1)]
>>> list(pairUp(range(3)))
Exception: tried to pair-up [0, 1, 2], but has odd number of items
>>> list(pairUp(range(4)))
[(0, 1), (2, 3)]
>>> list(pairUp(range(5)))
Exception: tried to pair-up [0, 1, 2, 3, 4], but has odd number of items
Concise method:
zip(sequence[::2], sequence[1::2])
# does not check for odd number of elements

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Grouping two lists in python - python

If item sequence in the lists inside of the Final_List is not important then this can be used, [list(set(sum(itm, []))) for itm in zip(list1, list2)]

Your code seems correct. Just modify the following portion: Final_List = [] for key in d: L = [key] + [x for x in d[key]] Final_List.append(L)

list3 = [] for i in xrange(0,max(len(list1[0]), len(list2[0]))): list3.append(list(list1[i])) list3[i].extend(x for x in list2[i] if x not in list3[i]) with a xrange, you can iterate only once through the list.

Related

Split list in python when same values occurs into a list of sublists

match line number of a 1d array with first element of a 2d array for large dataset in Python

How to compare each element of multiple lists and return name of lists which are different

How to erase duplicated element in row of a list in python

Grouping lists within lists in Python 3

Categories

Resources