Merge List of lists where sublists have common elements - python

I have a list of lists like this
list = [[1, 2], [1, 3], [4, 5]]
and as you see the first element of the first two sublists is repeated
So I want my output too be:
list = [[1, 2, 3], [4, 5]]
Thank you

The following code should solve your problem:
def merge_subs(lst_of_lsts):
res = []
for row in lst_of_lsts:
for i, resrow in enumerate(res):
if row[0]==resrow[0]:
res[i] += row[1:]
break
else:
res.append(row)
return res
Note that the elsebelongs to the inner for and is executed if the loop is exited without hitting the break.

I have a solution that builds a dict first with the 1st values, then creates a list from that, but the order may not be the same (i.e. [4, 5] may be before [1, 2, 3]):
>>> from collections import defaultdict
>>> d = defaultdict(list)
>>> map(lambda x: d[x[0]].append(x[1]), l)
[None, None, None]
>>> d
defaultdict(<type 'list'>, {1: [2, 3], 4: [5]})
>>> [[key] + list(val) for key, val in d.iteritems()]
[[1, 2, 3], [4, 5]]

You can use python sets, because you can compute intersection and union pretty easy. The code would be more clear, but the complexity would probably be comparable to the other solutions.

Although arguably unreadable:
# Note the _ after the list, otherwise you are redefining the list type in your scope
list_ = [[1, 2], [1, 3], [4, 5]]
from itertools import groupby
grouper = lambda l: [[k] + sum((v[1::] for v in vs), []) for k, vs in groupby(l, lambda x: x[0])]
print grouper(list_)
A more readable variant:
from collections import defaultdict
groups = defaultdict(list)
for vs in list_:
group[vs[0]] += vs[1:]
print group.items()
Note that these solve a more generic form of your problem, instead of [[1, 2], [1, 3], [4, 5]] you could also have something like this: [[1, 2, 3], [1, 4, 5], [2, 4, 5, 6], [3]]
Explanation about the _. This is why you don't want to overwrite list:
spam = list()
print spam
# returns []
list = spam
print list
# returns []
spam = list()
# TypeError: 'list' object is not callable
As you can see above, by setting list = spam we broke the default behaviour of list().

Related

How to merge smaller sub-elements into larger parent-elements in a list?

I have a list of lists, but some lists are "sublists" of other lists. What I want to do is remove the sublists from the larger list so that we only have the largest unique sublists.
For example:
>>> some_list = [[1], [1, 2], [1, 2, 3], [1, 4]]
>>> ideal_list = [[1, 2, 3], [1, 4]]
The code that I've written right now is:
new_list = []
for i in range(some_list)):
for j in range(i + 1, len(some_list)):
count = 0
for k in some_list[i]:
if k in some_list[j]:
count += 1
if count == len(some_list[i]):
new_list.append(some_list[j])
The basic algorithm that I had in mind is that we'd check if a list's elements were in the following sublists, and if so then we use the other larger sublist. It doesn't give the desired output (it actually gives [[1, 2], [1, 2, 3], [1, 4], [1, 2, 3]]) and I'm wondering what I could do to achieve what I want.
I don't want to use sets because duplicate elements matter.
Same idea as set, but using Counter instead. It should be a lot more efficient in sublist check part than brute force
from collections import Counter
new_list = []
counters = []
for arr in sorted(some_list, key=len, reverse=True):
arr_counter = Counter(arr)
if any((c & arr_counter) == arr_counter for c in counters):
continue # it is a sublist of something else
new_list.append(arr)
counters.append(arr_counter)
With some inspiration from #mkrieger1's comment, one possible solution would be:
def merge_sublists(some_list):
new_list = []
for i in range(len(some_list)):
true_or_false = []
for j in range(len(some_list)):
if some_list[j] == some_list[i]:
continue
true_or_false.append(all([x in some_list[j] for x in some_list[i]]))
if not any(true_or_false):
new_list.append(some_list[i])
return new_list
As is stated in the comment, a brute-force solution would be to loop through each element and check if it's a sublist of any other sublist. If it's not, then append it to the new list.
Test cases:
>>> merge_sublists([[1], [1, 2], [1, 2, 3], [1, 4]])
[[1, 2, 3], [1, 4]]
>>> merge_sublists([[1, 2, 3], [4, 5], [3, 4]])
[[1, 2, 3], [4, 5], [3, 4]]
Input:
l = [[1], [1, 2], [1, 2, 3], [1, 4]]
One way here:
l1 = l.copy()
for i in l:
for j in l:
if set(i).issubset(set(j)) and i!=j:
l1.remove(i)
break
This prints:
print(l1)
[[1, 2, 3], [1, 4]]
EDIT: (Taking care of duplicates as well)
l1 = [list(tupl) for tupl in {tuple(item) for item in l }]
l2 = l1.copy()
for i in l1:
for j in l1:
if set(i).issubset(set(j)) and i!=j:
l2.remove(i)
break

Merge python lists of different lengths

I am attempting to merge two python lists, where their values at a given index will form a list (element) in a new list. For example:
merge_lists([1,2,3,4], [1,5]) = [[1,1], [2,5], [3], [4]]
I could iterate on this function to combine ever more lists. What is the most efficient way to accomplish this?
Edit (part 2)
Upon testing the answer I had previously selected, I realized I had additional criteria and a more general problem. I would also like to combine lists containing lists or values. For example:
merge_lists([[1,2],[1]] , [3,4]) = [[1,2,3], [1,4]]
The answers currently provided generate lists of higher dimensions in cases like this.
One option is to use itertools.zip_longest (in python 3):
from itertools import zip_longest
[[x for x in t if x is not None] for t in zip_longest([1,2,3,4], [1,5])]
# [[1, 1], [2, 5], [3], [4]]
If you prefer sets:
[{x for x in t if x is not None} for t in zip_longest([1,2,3,4], [1,5])]
# [{1}, {2, 5}, {3}, {4}]
In python 2, use itertools.izip_longest:
from itertools import izip_longest
[[x for x in t if x is not None] for t in izip_longest([1,2,3,4], [1,5])]
#[[1, 1], [2, 5], [3], [4]]
Update to handle the slightly more complicated case:
def flatten(lst):
result = []
for s in lst:
if isinstance(s, list):
result.extend(s)
else:
result.append(s)
return result
This handles the above two cases pretty well:
[flatten(x for x in t if x is not None) for t in izip_longest([1,2,3,4], [1,5])]
# [[1, 1], [2, 5], [3], [4]]
[flatten(x for x in t if x is not None) for t in izip_longest([[1,2],[1]] , [3,4])]
# [[1, 2, 3], [1, 4]]
Note even though this works for the above two cases, but it can still break under deeper nested structure, since the case can get complicated very quickly. For a more general solution, you can see here.
Another way to have your desired output using zip():
def merge(a, b):
m = min(len(a), len(b))
sub = []
for k,v in zip(a,b):
sub.append([k, v])
return sub + list([k] for k in a[m:]) if len(a) > len(b) else sub + list([k] for k in b[m:])
a = [1, 2, 3, 4]
b = [1, 5]
print(merge(a, b))
>>> [[1, 1], [2, 5], [3], [4]]
You could use itertools.izip_longest and filter():
>>> lst1, lst2 = [1, 2, 3, 4], [1, 5]
>>> from itertools import izip_longest
>>> [list(filter(None, x)) for x in izip_longest(lst1, lst2)]
[[1, 1], [2, 5], [3], [4]]
How it works: izip_longest() aggregates the elements from two lists, filling missing values with Nones, which you then filter out with filter().
Another way using zip_longest and chain from itertools:
import itertools
[i for i in list(itertools.chain(*itertools.zip_longest(list1, list2, list3))) if i is not None]
or in 2 lines (more readable):
merged_list = list(itertools.chain(*itertools.zip_longest(a, b, c)))
merged_list = [i for i in merged_list if i is not None]

Convert nested iterables to list

Is there an easy way in python (using itertools, or otherwise) to convert a nested iterable f into its corresponding list or tuple? I'd like to save f so I can iterate over it multiple times, which means that if some nested elements of f are generators, I'll be in trouble.
I'll give an example input/output.
>>> g = iter(range(2))
>>> my_input = [1, [2, 3], ((4), 5), [6, g]]
>>> magical_function(my_input)
[1, [2, 3], [[4], 5], [6, [0, 1]]]
It would be fine if the output consisted of tuples, too. The issue is that iterating over g "consumes" it, so it can't be used again.
This seems like it would be best to do by checking if each element is iterable, and calling a recursive function over it if it is iterable. Just as a quick draw-up, I would try something like:
import collections
g = iter(range(2))
my_input = [1, [2, 3], ((4), 5), [6, g]]
def unfold(iterable):
ret = []
for element in iterable:
if isinstance(element, collections.Iterable):
ret.append(unfold(element))
else:
ret.append(element)
return ret
n = unfold(my_input)
print(n)
print(n)
which returns
$ python3 so.py
[1, [2, 3], [4, 5], [6, [0, 1]]]
[1, [2, 3], [4, 5], [6, [0, 1]]]
It's not the prettiest way, and you can find ways to improve it (it puts everything in a list instead of preserving tuples), but here is the general idea I would use.

How can I find same values in a list and group together a new list?

From this list:
N = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
I'm trying to create:
L = [[1],[2,2],[3,3,3],[4,4,4,4],[5,5,5,5,5]]
Any value which is found to be the same is grouped into it's own sublist.
Here is my attempt so far, I'm thinking I should use a while loop?
global n
n = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5] #Sorted list
l = [] #Empty list to append values to
def compare(val):
""" This function receives index values
from the n list (n[0] etc) """
global valin
valin = val
global count
count = 0
for i in xrange(len(n)):
if valin == n[count]: # If the input value i.e. n[x] == n[iteration]
temp = valin, n[count]
l.append(temp) #append the values to a new list
count +=1
else:
count +=1
for x in xrange (len(n)):
compare(n[x]) #pass the n[x] to compare function
Use itertools.groupby:
from itertools import groupby
N = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
print([list(j) for i, j in groupby(N)])
Output:
[[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5]]
Side note: Prevent from using global variable when you don't need to.
Someone mentions for N=[1, 2, 2, 3, 3, 3, 4, 4, 4, 4, 5, 5, 5, 5, 5, 1] it will get [[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5], [1]]
In other words, when numbers of the list isn't in order or it is a mess list, it's not available.
So I have better answer to solve this problem.
from collections import Counter
N = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
C = Counter(N)
print [ [k,]*v for k,v in C.items()]
You can use itertools.groupby along with a list comprehension
>>> l = [1,2,2,3,3,3,4,4,4,4,5,5,5,5,5]
>>> [list(v) for k,v in itertools.groupby(l)]
[[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5]]
This can be assigned to the variable L as in
L = [list(v) for k,v in itertools.groupby(l)]
You're overcomplicating this.
What you want to do is: for each value, if it's the same as the last value, just append it to the list of last values; otherwise, create a new list. You can translate that English directly to Python:
new_list = []
for value in old_list:
if new_list and new_list[-1][0] == value:
new_list[-1].append(value)
else:
new_list.append([value])
There are even simpler ways to do this if you're willing to get a bit more abstract, e.g., by using the grouping functions in itertools. But this should be easy to understand.
If you really need to do this with a while loop, you can translate any for loop into a while loop like this:
for value in iterable:
do_stuff(value)
iterator = iter(iterable)
while True:
try:
value = next(iterator)
except StopIteration:
break
do_stuff(value)
Or, if you know the iterable is a sequence, you can use a slightly simpler while loop:
index = 0
while index < len(sequence):
value = sequence[index]
do_stuff(value)
index += 1
But both of these make your code less readable, less Pythonic, more complicated, less efficient, easier to get wrong, etc.
You can do that using numpy too:
import numpy as np
N = np.array([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
counter = np.arange(1, np.alen(N))
L = np.split(N, counter[N[1:]!=N[:-1]])
The advantage of this method is when you have another list which is related to N and you want to split it in the same way.
Another slightly different solution that doesn't rely on itertools:
#!/usr/bin/env python
def group(items):
"""
groups a sorted list of integers into sublists based on the integer key
"""
if len(items) == 0:
return []
grouped_items = []
prev_item, rest_items = items[0], items[1:]
subgroup = [prev_item]
for item in rest_items:
if item != prev_item:
grouped_items.append(subgroup)
subgroup = []
subgroup.append(item)
prev_item = item
grouped_items.append(subgroup)
return grouped_items
print group([1,2,2,3,3,3,4,4,4,4,5,5,5,5,5])
# [[1], [2, 2], [3, 3, 3], [4, 4, 4, 4], [5, 5, 5, 5, 5]]

Remove duplicated lists in list of lists in Python

I've seen some questions here very related but their answer doesn't work for me. I have a list of lists where some sublists are repeated but their elements may be disordered. For example
g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]
The output should be, naturally according to my question:
g = [[1,2,3],[9,0,1],[4,3,2]]
I've tried with set but only removes those lists that are equal (I thought It should work because sets are by definition without order). Other questions i had visited only has examples with lists exactly duplicated or repeated like this: Python : How to remove duplicate lists in a list of list?. For now order of output (for list and sublists) is not a problem.
(ab)using side-effects version of a list comp:
seen = set()
[x for x in g if frozenset(x) not in seen and not seen.add(frozenset(x))]
Out[4]: [[1, 2, 3], [9, 0, 1], [4, 3, 2]]
For those (unlike myself) who don't like using side-effects in this manner:
res = []
seen = set()
for x in g:
x_set = frozenset(x)
if x_set not in seen:
res.append(x)
seen.add(x_set)
The reason that you add frozensets to the set is that you can only add hashable objects to a set, and vanilla sets are not hashable.
If you don't care about the order for lists and sublists (and all items in sublists are unique):
result = set(map(frozenset, g))
If a sublist may have duplicates e.g., [1, 2, 1, 3] then you could use tuple(sorted(sublist)) instead of frozenset(sublist) that removes duplicates from a sublist.
If you want to preserve the order of sublists:
def del_dups(seq, key=frozenset):
seen = {}
pos = 0
for item in seq:
if key(item) not in seen:
seen[key(item)] = True
seq[pos] = item
pos += 1
del seq[pos:]
Example:
del_dups(g, key=lambda x: tuple(sorted(x)))
See In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?
What about using mentioned by roippi frozenset this way:
>>> g = [list(x) for x in set(frozenset(i) for i in [set(i) for i in g])]
[[0, 9, 1], [1, 2, 3], [2, 3, 4]]
I would convert each element in the list to a frozenset (which is hashable), then create a set out of it to remove duplicates:
>>> g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]
>>> set(map(frozenset, g))
set([frozenset([0, 9, 1]), frozenset([1, 2, 3]), frozenset([2, 3, 4])])
If you need to convert the elements back to lists:
>>> map(list, set(map(frozenset, g)))
[[0, 9, 1], [1, 2, 3], [2, 3, 4]]

Categories

Resources