how to remove all subset from a list of list - python

what is the efficient way to clean sublist in list . cause I only want to got the biggest set in list. just like.
b = [[1,2,3], [1,2], [3,5], [2,3,4], [2,3,4], [3,4,5], [1,2,4,6,7]]
and I want the output as follow.
result = [[1,2,3], [2,3,4], [3,4,5], [1,2,4,6,7]]
Cause [1,2] is subset of [1,2,3] and [1,2,4,6,7], [3,5] is subset of [3,4,5], and also [2,3,4] appear 2 times, only want calculate 1 time in final result. I want to based on the subset logical to filter data.
I only think out 2 loops solution to solve this problem, but if there is other efficient way to solve this problem.
what I tried like that: (after I optimising this one more effect, add break and add 1 part not calculate 2 times)
b = [[1,2,3], [1,2], [3,5], [2,3,4], [2,3,4], [3,4,5], [1,2,4,6,7]]
i = 0
record = []
subset_status = False
for index, re in enumerate(b):
while i <= (len(b)-1):
if i != index:
if i not in record:
if set(re).issubset(b[i]):
subset_status = True
break
i += 1
i = 0
if subset_status:
record.append(index)
subset_status = False
print(record)
>>[1, 2, 3]
So I got the index in [1,2,3] is the dirty data.
Thanks.

filter your list on condition:
b = [[1,2,3], [1,2], [3,5], [2,3,4],[3,4,5]]
print(list(filter(lambda x: len(x) == 3, b)))
# [[1, 2, 3], [2, 3, 4], [3, 4, 5]]

A conditional list comprehension is a pythonic, flexible and performant approach. It is usually faster and less error prone to assemble the clean list from scratch than to repeatedly remove elements:
b = [[1, 2, 3], [1, 2], [3, 5], [2, 3, 4],[3, 4, 5]]
cleaned = [x for x in b if clean(x)] # where clean is your condition
# e.g.
cleaned = [x for x in b if len(x) == 3]
# [[1, 2, 3], [2, 3, 4], [3, 4, 5]]
If you need to mutate the original list object, use slice assignment:
b[:] = [x for x in b if clean(x)]

One way to do this is to process the lists in b in order of length, from longest to shortest.
b = [[1,2,3], [1,2], [3,5], [2,3,4], [2,3,4], [3,4,5], [1,2,4,6,7]]
result = []
for u in sorted(map(set, b), key=len, reverse=True):
if not any(u <= v for v in result):
result.append(u)
print(result)
output
[{1, 2, 4, 6, 7}, {1, 2, 3}, {2, 3, 4}, {3, 4, 5}]
If you need to keep the inner lists as actual lists, and you also need to preserve the order, then we can do that with an additional pass over the data. But instead of using a list for result I'll use a set to make the tests more efficient. And that means turning the sublists into frozensets: plain sets won't work because only hashable objects can be put into a set.
b = [[1,2,3], [1,2], [3,5], [2,3,4], [2,3,4], [3,4,5], [1,2,4,6,7]]
temp = set()
for u in sorted(map(frozenset, b), key=len, reverse=True):
if not any(u <= v for v in temp):
temp.add(u)
newb = []
for u in b:
if set(u) in temp and u not in newb:
newb.append(u)
print(newb)
output
[[1, 2, 3], [2, 3, 4], [3, 4, 5], [1, 2, 4, 6, 7]]

This is not very good, but it works:
result = []
for i in b:
for j in result:
if all(c in j for c in i):
break
else:
new_list.append(i)
for i in result:
for j in result:
if all(c in j for c in i) and result.index(i) != result.index(j):
del(result[result.index(i)])
break

You can use tuples and product to detect if item is a sublist, then construct a new list excluding those sublist
list comprehension
from itertools import product
b = [[1,2,3], [1,2], [3,5], [2,3,4], [3,4,5], [1,2,4,6,7]]
dirty = [i for i in b for j in b if i != j if tuple(i) in product(j, repeat = len(i))]
clean = [i for i in b if i not in dirty]
Expanded explanation:
dirty = []
for i in b:
for j in b:
if i != j:
if tuple(i) in product(j, repeat = len(i)):
dirty.append(i)
clean = [i for i in b if i not in dirty]
[[1, 2, 3], [2, 3, 4], [3, 4, 5], [1, 2, 4, 6, 7]]

Related

How to extract elements from a nested list

I have a nested list in the form of [[1,2,3], [3,4,5], [8,6,2,5,6], [7,2,9]]
I would like to extract every first item into a new list, every second item into a new list and the rest into a new nested list:
a = [1,3,8,7] b = [2,4,6,2], c = [[3], [5], [2,5,6],[9]]
Is it possible to avoid using the for loop because the real nested list is quite large? Any help would be appreciated.
Ultimately, whatever your solution would be, you're gonna have to have a for loop inside your code and my advice would be to make it as clean and as readable as possible.
That being said, here's what I would propose:
arr = [[1,2,3], [3,4,5], [8,6,2,5,6], [7,2,9]]
first_arr, second_arr, third_arr = [], [], []
for nested in arr:
first_arr.append(nested[0])
second_arr.append(nested[1])
third_arr.append(nested[2:])
This is a naive, simple looped solution using list comprehensions, but see if it is fast enough for you.
l = [[1,2,3], [3,4,5], [8,6,2,5,6], [7,2,9]]
a = [i[0] for i in l]
b = [i[1] for i in l]
c = [i[2:] for i in l]
which returns:
>>a
[1, 3, 8, 7]
>>b
[2, 4, 6, 2]
>>c
[[3], [5], [2, 5, 6], [9]]
At the moment I cannot think a solution without for loops, I hope I will be able to update my answer later.
Here's a solution using for loops:
data = [[1,2,3], [3,4,5], [8,6,2,5,6], [7,2,9]]
list1 = []
list2 = []
list3 = []
for item in data:
else_list = []
for index, value in enumerate(item):
if index == 0:
list1.append(value)
elif index == 1:
list2.append(value)
else:
else_list.append(value)
list3.append(else_list)
print(list1)
print(list2)
print(list3)
Output
[1, 3, 8, 7]
[2, 4, 6, 2]
[[3], [5], [2, 5, 6], [9]]
Just for fun I share also a performance comparison, great job in using just one for loop Meysam!
import timeit
# a = [1,3,8,7] b = [2,4,6,2], c = [[3], [5], [2,5,6],[9]]
def solution_1():
data = [[1, 2, 3], [3, 4, 5], [8, 6, 2, 5, 6], [7, 2, 9]]
list1 = []
list2 = []
list3 = []
for item in data:
else_list = []
for index, value in enumerate(item):
if index == 0:
list1.append(value)
elif index == 1:
list2.append(value)
else:
else_list.append(value)
list3.append(else_list)
def solution_2():
arr = [[1, 2, 3], [3, 4, 5], [8, 6, 2, 5, 6], [7, 2, 9]]
first_arr, second_arr, third_arr = [], [], []
for nested in arr:
first_arr.append(nested[0])
second_arr.append(nested[1])
third_arr.append(nested[2:])
def solution_3():
l = [[1, 2, 3], [3, 4, 5], [8, 6, 2, 5, 6], [7, 2, 9]]
a = [i[0] for i in l]
b = [i[1] for i in l]
c = [i[2:] for i in l]
if __name__ == "__main__":
print("solution_1 performance:")
print(timeit.timeit("solution_1()", "from __main__ import solution_1", number=10))
print("solution_2 performance:")
print(timeit.timeit("solution_2()", "from __main__ import solution_2", number=10))
print("solution_3 performance:")
print(timeit.timeit("solution_3()", "from __main__ import solution_3", number=10))
Output
solution_1 performance:
9.580000000000005e-05
solution_2 performance:
1.7200000000001936e-05
solution_3 performance:
1.7499999999996685e-05
Suppose the nested list has unknown depth, then we'd have to use recursion
def get_elements(l):
ret = []
for elem in l:
if type(elem) == list:
ret.extend(get_elements(elem))
else:
ret.append(elem)
return ret
l = [1,2,[3,4],[[5],[6]]]
print(get_elements(l))
# Output: [1, 2, 3, 4, 5, 6]
Though it is not quite recommended to use unknown-depth nested lists in the first place.

How to merge smaller sub-elements into larger parent-elements in a list?

I have a list of lists, but some lists are "sublists" of other lists. What I want to do is remove the sublists from the larger list so that we only have the largest unique sublists.
For example:
>>> some_list = [[1], [1, 2], [1, 2, 3], [1, 4]]
>>> ideal_list = [[1, 2, 3], [1, 4]]
The code that I've written right now is:
new_list = []
for i in range(some_list)):
for j in range(i + 1, len(some_list)):
count = 0
for k in some_list[i]:
if k in some_list[j]:
count += 1
if count == len(some_list[i]):
new_list.append(some_list[j])
The basic algorithm that I had in mind is that we'd check if a list's elements were in the following sublists, and if so then we use the other larger sublist. It doesn't give the desired output (it actually gives [[1, 2], [1, 2, 3], [1, 4], [1, 2, 3]]) and I'm wondering what I could do to achieve what I want.
I don't want to use sets because duplicate elements matter.
Same idea as set, but using Counter instead. It should be a lot more efficient in sublist check part than brute force
from collections import Counter
new_list = []
counters = []
for arr in sorted(some_list, key=len, reverse=True):
arr_counter = Counter(arr)
if any((c & arr_counter) == arr_counter for c in counters):
continue # it is a sublist of something else
new_list.append(arr)
counters.append(arr_counter)
With some inspiration from #mkrieger1's comment, one possible solution would be:
def merge_sublists(some_list):
new_list = []
for i in range(len(some_list)):
true_or_false = []
for j in range(len(some_list)):
if some_list[j] == some_list[i]:
continue
true_or_false.append(all([x in some_list[j] for x in some_list[i]]))
if not any(true_or_false):
new_list.append(some_list[i])
return new_list
As is stated in the comment, a brute-force solution would be to loop through each element and check if it's a sublist of any other sublist. If it's not, then append it to the new list.
Test cases:
>>> merge_sublists([[1], [1, 2], [1, 2, 3], [1, 4]])
[[1, 2, 3], [1, 4]]
>>> merge_sublists([[1, 2, 3], [4, 5], [3, 4]])
[[1, 2, 3], [4, 5], [3, 4]]
Input:
l = [[1], [1, 2], [1, 2, 3], [1, 4]]
One way here:
l1 = l.copy()
for i in l:
for j in l:
if set(i).issubset(set(j)) and i!=j:
l1.remove(i)
break
This prints:
print(l1)
[[1, 2, 3], [1, 4]]
EDIT: (Taking care of duplicates as well)
l1 = [list(tupl) for tupl in {tuple(item) for item in l }]
l2 = l1.copy()
for i in l1:
for j in l1:
if set(i).issubset(set(j)) and i!=j:
l2.remove(i)
break

Swap two values in a list of lists (python)

I want to swap every occurrence of two elements in a list of lists. I have seen previous answers but they are different from what I'm trying to do. Also, I am looking for a clean and concise way (preferably list-comprehension) for this purpose.
Input
my_list = [[1,2], [1,3], [1,4], [3,2]]
Output
my_list = [[2,1], [2,3], [2,4], [3,1]]
I am trying to do something like this but no success:
[1 if i==2 else i, 2 if i==1 else i for i in my_list]
Here's a simple solution using list comprehension:
my_list = [[1,2], [1,3], [1,4], [3,2]]
a = 1
b = 2
my_list = [[a if i==b else b if i==a else i for i in j] for j in my_list]
print(my_list) # [[2, 1], [2, 3], [2, 4], [3, 1]]
If you want to add more elements to replace you can use a dictionary:
swap = {
1: 2,
2: 1
}
my_list = [[swap.get(i, i) for i in j] for j in my_list]
Someone else will probably answer with something better, but this would work.
def check(num):
if num == 1:
return 2
elif num == 2:
return 1
else:
return num
out = [[check(j) for j in i] for i in my_list]
If you need to swap 1's with 2's, this one-liner will work:
>>> my_list = [[1,2], [1,3], [1,4], [3,2]]
>>> print([[(lambda k: (1 if val == 2 else 2) if val in [1, 2] else val)(val) for val in sub_list] for sub_list in my_list])
[[2, 1], [2, 3], [2, 4], [3, 1]]
read the second line from right to left... in chunks!

Python: Appending numerous 2D list to one 2D list

For the sake of simplicity I will just use two lists.
So I have the following 2D lists:
> > a = [[1,2,3],
> > [1,2,3]]
> > b = [[4,5,6],
> > [4,5,6]]
And if I append list a and b, I'm looking to obtain the following:
masterlist = [[1,4], [2,5], [3,6], [1,4], [2,5], [3,6]]
The following code is what I have tried:
filenames = [a,b] #For the example, since I will have multiple arrays
masterlist = []
counter = 0
for file in filenames:
if counter == 0: #This if is to try to create lists within the list
for row in file: #This loops is to iterate throughout the whole list
for col in row:
c = [masterlist.append([col])]
[masterlist.append(c) for row, col in zip(masterlist, c)]
counter = 1
else: #This else is to append each element to their respective position
for row in file:
for col in row:
c = [masterlist.append(col)]
[masterlist.append([c]) for row, col in zip(masterlist, c)]
The output when printing masterlist is the following:
[[1], [2], [3], [1], [2], [3], [None], 4, 5, 6, 4, 5, 6, [[None]]]
I'm not sure where the [None]'s come from either. And as we can see '4,5,6...' aren't appended to the lists '[1], [2], [3]...' respectively.
You can iterate through the items of the lists and then add them to your masterlist:
a = [[1,2,3],
[1,2,3]]
b = [[4,5,6],
[4,5,6]]
masterlist = []
for aa,bb in zip(a,b): # loop over lists
for itema, itemb in zip(aa,bb): # loop over items in list
masterlist = masterlist + [[itema, itemb]]
output:
[[1, 4], [2, 5], [3, 6], [1, 4], [2, 5], [3, 6]]
If you use numpy,this is really easy
import numpy as np
a = np.array([[1,2,3],
[1,2,3]])
b = np.array([[4,5,6],
[4,5,6]])
fl = np.vstack(np.dstack((a,b)))
output
array([[1, 4],
[2, 5],
[3, 6],
[1, 4],
[2, 5],
[3, 6]])

Combine lists that have at least a number in common in a list of lists, Python

Consider this list of lists:
l = [ [1], [1], [1,2,3], [4,1], [5], [5], [6], [7,8,9], [7,6], [8,5] ]
I want to combine all the lists that have at least one number in common, that will be done iteratively until it finished and no dubletters will go with. The result will be:
combine(l) = [ [1,2,3,4], [5,6,7,8,9] ]
is there any neat way to do this, perhaps with itertools?
out = []
for l in lists:
for o in out:
if set(l).intersection(set(o)):
o[:] = list(set(l) + set(o)) # Mutate, don't reassign temp var
break
else:
out.append(l)
Not perfectly written, could be optimized and is not sorted, but should give you the idea of how to do it.
Maybe this?
l = [ [1], [1], [1,2,3], [4,1], [5], [5], [6], [7,8,9], [7,6], [8,5] ]
a, b = [], map(set, l)
while len(a) != len(b):
a, b = b, []
for x in a:
for i, p in enumerate(b):
if p & x:
b[i] = p | x
break
else:
b.append(x)
print a
# [set([1, 2, 3, 4]), set([5, 6, 7, 8, 9])]
My naive attempt:
merge all tuples when their intersection is not empty
sort each tuple
remove duplicate tuples
repeat this until there are no more changes.
Example:
def combine_helper(l):
l = map(set, l)
for i, x in enumerate(l, 1):
x = set(x)
for y in l[i:]:
if x & y:
x = x | y
yield tuple(sorted(x))
def combine(l):
last_l = []
new_l = l
while last_l != new_l:
last_l = new_l
new_l = list(set(combine_helper(last_l)))
return map(list, last_l)
l = [ [1], [1], [1,2,3], [4,1], [5], [5], [6], [7,8,9], [7,6], [8,5] ]
print combine(l)
Output:
$ python test.py
[[1, 2, 3, 4], [5, 6, 7, 8, 9]]
Its possible to introduce the following attempt:
Make a list of 1-element sets with individual values present in the list. This is the output list.
The l list is a recipe how items in the output list should be joined. e.g. If the output list is [{1}, {2}, {3}] and the first element in the l list is [1,3], then all sets containing 1 and 3 in the output list should be joined: output = [{1,3}, {2}].
Repeat step 2. for every item on the l list.
Code:
l = [ [1], [1], [1,2,3], [4,1], [5], [5], [6], [7,8,9], [7,6], [8,5] ]
def compose(l):
# the following will convert l into a list of 1-element sets:
# e.g. [ {1}, {2}, ... ]
r = sum(l, [])
r = map(lambda x: set([x]), set(r))
# for every item in l
# find matching sets in r and join them together
for item in map(set, l):
outside = [x for x in r if not x & item] # elements untouched
inside = [x for x in r if x & item] # elements to join
inside = set([]).union(*inside) # compose sets
r = outside + [inside]
return r
Example:
>>> compose(l)
[set([1, 2, 3, 4]), set([8, 9, 5, 6, 7])]
Recursion to the rescue! And don't forget reduce!
input_list = [ [1], [1], [1, 2, 3], [4, 1], [5], [5], [6], [7, 8, 9],
[7, 6], [8, 5] ]
def combine(input_list):
input_list = map(set, input_list) # working with sets has some advantages
reduced_list = reduce(combine_reduce, input_list, [])
if len(reduced_list) == len(input_list):
# return the whole thing in the original format (sorted lists)
return map(sorted, map(list, reduced_list))
else:
# recursion happens here
return combine(reduced_list)
def combine_reduce(reduced_list, numbers):
'''
find the set to add the numbers to or append as a new set.
'''
for sub_set in reduced_list:
if sub_set.intersection(numbers):
sub_set.update(numbers)
return reduced_list
reduced_list.append(numbers)
return reduced_list
print combine(input_list)
Prints out:
$ python combine.py
[[1, 2, 3, 4], [5, 6, 7, 8, 9]]
We have two things going on here. The first is reduce: I'm using it to cook the list down, by fitting each element into the resulting list somewhere or appending it, if that didn't work. This does not do the whole job, though, so we repeat this process (recursion!) until reducing does not provide a shorter list.
Also, use of set allows for the handy intersection method. You will notice the line with map(set, input_list) is redundant in recursion. Extracting a wrapper function combine from the inner function combine_inner and placing the formatting / unformatting (from list to set and back) in the outer function is left as an exercise.

Categories

Resources