Remove duplicated lists in list of lists in Python - python

I've seen some questions here very related but their answer doesn't work for me. I have a list of lists where some sublists are repeated but their elements may be disordered. For example
g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]
The output should be, naturally according to my question:
g = [[1,2,3],[9,0,1],[4,3,2]]
I've tried with set but only removes those lists that are equal (I thought It should work because sets are by definition without order). Other questions i had visited only has examples with lists exactly duplicated or repeated like this: Python : How to remove duplicate lists in a list of list?. For now order of output (for list and sublists) is not a problem.

(ab)using side-effects version of a list comp:
seen = set()
[x for x in g if frozenset(x) not in seen and not seen.add(frozenset(x))]
Out[4]: [[1, 2, 3], [9, 0, 1], [4, 3, 2]]
For those (unlike myself) who don't like using side-effects in this manner:
res = []
seen = set()
for x in g:
x_set = frozenset(x)
if x_set not in seen:
res.append(x)
seen.add(x_set)
The reason that you add frozensets to the set is that you can only add hashable objects to a set, and vanilla sets are not hashable.

If you don't care about the order for lists and sublists (and all items in sublists are unique):
result = set(map(frozenset, g))
If a sublist may have duplicates e.g., [1, 2, 1, 3] then you could use tuple(sorted(sublist)) instead of frozenset(sublist) that removes duplicates from a sublist.
If you want to preserve the order of sublists:
def del_dups(seq, key=frozenset):
seen = {}
pos = 0
for item in seq:
if key(item) not in seen:
seen[key(item)] = True
seq[pos] = item
pos += 1
del seq[pos:]
Example:
del_dups(g, key=lambda x: tuple(sorted(x)))
See In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?

What about using mentioned by roippi frozenset this way:
>>> g = [list(x) for x in set(frozenset(i) for i in [set(i) for i in g])]
[[0, 9, 1], [1, 2, 3], [2, 3, 4]]

I would convert each element in the list to a frozenset (which is hashable), then create a set out of it to remove duplicates:
>>> g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]
>>> set(map(frozenset, g))
set([frozenset([0, 9, 1]), frozenset([1, 2, 3]), frozenset([2, 3, 4])])
If you need to convert the elements back to lists:
>>> map(list, set(map(frozenset, g)))
[[0, 9, 1], [1, 2, 3], [2, 3, 4]]

Related

How to remove duplicates from the list of tuples? [duplicate]

I have a 2D list which I create like so:
Z1 = [[0 for x in range(3)] for y in range(4)]
I then proceed to populate this list, such that Z1 looks like this:
[[1, 2, 3], [4, 5, 6], [2, 3, 1], [2, 5, 1]]
I need to extract the unique 1x3 elements of Z1, without regard to order:
Z2 = makeUnique(Z1) # The solution
The contents of Z2 should look like this:
[[4, 5, 6], [2, 5, 1]]
As you can see, I consider [1, 2, 3] and [2, 3, 1] to be duplicates because I don't care about the order.
Also note that single numeric values may appear more than once across elements (e.g. [2, 3, 1] and [2, 5, 1]); it's only when all three values appear together more than once (in the same or different order) that I consider them to be duplicates.
I have searched dozens of similar problems, but none of them seems to address my exact issue. I'm a complete Python beginner so I just need a push in the right direction.
I have already tried :
Z2= dict((x[0], x) for x in Z1).values()
Z2= set(i for j in Z2 for i in j)
But this does not produce the desired behaviour.
Thank you very much for your help!
Louis Vallance
If the order of the elements inside the sublists does not matter, you could use the following:
from collections import Counter
z1 = [[1, 2, 3], [4, 5, 6], [2, 3, 1], [2, 5, 1]]
temp = Counter([tuple(sorted(x)) for x in z1])
z2 = [list(k) for k, v in temp.items() if v == 1]
print(z2) # [[4, 5, 6], [1, 2, 5]]
Some remarks:
sorting makes lists [1, 2, 3] and [2, 3, 1] from the example equal so they get grouped by the Counter
casting to tuple converts the lists to something that is hashable and can therefore be used as a dictionary key.
the Counter creates a dict with the tuples created above as keys and a value equal to the number of times they appear in the original list
the final list-comprehension takes all those keys from the Counter dictionary that have a count of 1.
If the order does matter you can use the following instead:
z1 = [[1, 2, 3], [4, 5, 6], [2, 3, 1], [2, 5, 1]]
def test(sublist, list_):
for sub in list_:
if all(x in sub for x in sublist):
return False
return True
z2 = [x for i, x in enumerate(z1) if test(x, z1[:i] + z1[i+1:])]
print(z2) # [[4, 5, 6], [2, 5, 1]]

remove duplicates from 2d lists regardless of order [duplicate]

This question already has answers here:
Removing permutations from a list of tuples [duplicate]
(1 answer)
Python: Remove pair of duplicated strings in random order
(2 answers)
Closed 10 months ago.
I have a 2d list
a = [[1, 2], [1, 3], [2, 1], [2, 3], [3, 1], [3, 2]]
How can I get the result:
result = [[1,2],[1,3],[2,3]]
Where duplicates are removed regardless of their order of the inner lists.
In [3]: b = []
In [4]: for aa in a:
...: if not any([set(aa) == set(bb) for bb in b if len(aa) == len(bb)]):
...: b.append(aa)
In [5]: b
Out[5]: [[1, 2], [1, 3], [2, 3]]
Try using a set to keep track of what lists you have seen:
from collections import Counter
a = [[1, 2], [1, 3], [2, 1], [2, 3], [3, 1], [3, 2], [1, 2, 1]]
seen = set()
result = []
for lst in a:
current = frozenset(Counter(lst).items())
if current not in seen:
result.append(lst)
seen.add(current)
print(result)
Which outputs:
[[1, 2], [1, 3], [2, 3], [1, 2, 1]]
Note: Since lists are not hash able, you can store frozensets of Counter objects to detect order less duplicates. This removes the need to sort at all.
You can try
a = [[1, 2], [1, 3], [2, 1], [2, 3], [3, 1], [3, 2]]
aa = [tuple(sorted(elem)) for elem in a]
set(aa)
output
{(1, 2), (1, 3), (2, 3)}
The 'Set' concept would come in handy here. The list you have (which contains duplicates) can be converted to a Set (which will never contain a duplicate).
Find more about Sets here : Set
Example :
l = ['foo', 'foo', 'bar', 'hello']
A set can be created directly:
s = set(l)
now if you check the contents of the list
print(s)
>>> {'foo', 'bar', 'hello'}
Set will work this way with any iterable object!
Hope it helps!

Operations on sub-lists in list

I have a list of lists:
a = [[1, 2], [2, 3], [4, 3]]
How to get the following effect in two steps ?:
b = [[1, 2, 2, 3], [1, 2, 4, 3], [2, 3, 4, 3]]
b = [[1, 2, 3], [1, 2, 4, 3]], it means:
1.1. If the same values occur in the sub-list b[i] next to each other, then
one of these values must be deleted.
2.2. If the same values appear in a given sub-list b[i] but not next to each
other, then the entire sub-list b[i] must be deleted.
timegb is right. An elegant solution involves some amount of trickery and deception. I'll try and break down the steps.
find all 2-combinations of your input using itertools.combinations
flatten returned combinations with map and chain
for each combination, group by consecutive elements
keep only those that satisfy your condition by doing a length check.
from itertools import chain, combinations, groupby
out = []
for r in map(lambda x: list(chain.from_iterable(x)), combinations(a, 2)):
j = [i for i, _ in groupby(r)]
if len(j) <= len(set(r)):
out.append(j)
print(out)
[[1, 2, 3], [1, 2, 4, 3]]
If you need only the first part, just find combinations and flatten:
out = list(map(lambda x: list(chain.from_iterable(x)), combinations(a, 2)))
print(out)
[[1, 2, 2, 3], [1, 2, 4, 3], [2, 3, 4, 3]]

make new python lists with indexes from items in a single list

I want to take a list, e.g. [0, 1, 0, 1, 2, 2, 3], and make a list of lists of (index + 1) for each of the unique elements. For the above, for example, it would be [[1, 3], [2, 4], [5, 6], [7]].
Right now my solution is the ultimate in clunkiness:
list_1 = [0, 1, 0, 1, 2, 2, 3]
maximum = max(list_1)
master = []
for i in range(maximum + 1):
temp_list = []
for j,k in enumerate(list_1):
if k == i:
temp_list.append(j + 1)
master.append(temp_list)
print master
Any thoughts on a more pythonic way to do this would be much appreciated!
I would do this in two steps:
Build a map {value: [list, of, indices], ...}:
index_map = {}
for index, value in enumerate(list_1):
index_map.setdefault(value, []).append(index+1)
Extract the value lists from the dictionary into your master list:
master = [index_map.get(index, []) for index in range(max(index_map)+1)]
For your example, this would give:
>>> index_map
{0: [1, 3], 1: [2, 4], 2: [5, 6], 3: [7]}
>>> master
[[1, 3], [2, 4], [5, 6], [7]]
This implementation iterates over the whole list only once (O(n), where n is len(list_1)) whereas the others so far iterate over the whole list once for each unique element (O(n*m), where m is len(set(list_1))). By taking max(d) rather than max(list_1) you only need to iterate over the length of the unique items, which is also more efficient.
The implementation can be slightly simpler if you make d = collections.defaultdict(list).
list_1 = [0, 1, 0, 1, 2, 2, 3]
master = []
for i in range(max(list_1)+1):
master.append([j+1 for j,k in enumerate(list_1) if k==i])
master = [[i+1 for i in range(len(list_1)) if list_1[i]==j] for j in range(max(list_1)+1)]
It is just the same that your current code, but it uses list comprehension which is often a quite good pythonic way to solve this kind of problem.

Removing commutative pairs in a list in Python

I got a list as follows:
list_1 = [[3, 0], [0, 3], [3, 4]]
I'm trying to filter out the commutative elements in this. For example, [3,0] and [0,3] are the same and I need to keep only one of them. I tried converting this into a set, and it didn't help. I also tried iterating, but it's causing real overhead. Is there any Pythonic way to do this?
Thanks.
For example, you can use dict comprehension:
>>> {tuple(sorted(t)): t for t in list_1}.values()
[[0, 3], [3, 4]]
You can use a set of frozensets for the filtering.
If order does not matter:
>>> map(list, set(frozenset(t) for t in list_1))
[[3, 4], [0, 3]]
To retain order:
list_1 = [[3, 0], [0, 3], [3, 4]]
seen = set()
filtered = []
for item in list_1:
item_set = frozenset(item)
if item_set not in seen:
filtered.append(item)
seen.add(item_set)
Result:
>>> filtered
[[3, 0], [3, 4]]

Categories

Resources