remove duplicates from 2d lists regardless of order [duplicate] - python

This question already has answers here:
Removing permutations from a list of tuples [duplicate]
(1 answer)
Python: Remove pair of duplicated strings in random order
(2 answers)
Closed 10 months ago.
I have a 2d list
a = [[1, 2], [1, 3], [2, 1], [2, 3], [3, 1], [3, 2]]
How can I get the result:
result = [[1,2],[1,3],[2,3]]
Where duplicates are removed regardless of their order of the inner lists.

In [3]: b = []
In [4]: for aa in a:
...: if not any([set(aa) == set(bb) for bb in b if len(aa) == len(bb)]):
...: b.append(aa)
In [5]: b
Out[5]: [[1, 2], [1, 3], [2, 3]]

Try using a set to keep track of what lists you have seen:
from collections import Counter
a = [[1, 2], [1, 3], [2, 1], [2, 3], [3, 1], [3, 2], [1, 2, 1]]
seen = set()
result = []
for lst in a:
current = frozenset(Counter(lst).items())
if current not in seen:
result.append(lst)
seen.add(current)
print(result)
Which outputs:
[[1, 2], [1, 3], [2, 3], [1, 2, 1]]
Note: Since lists are not hash able, you can store frozensets of Counter objects to detect order less duplicates. This removes the need to sort at all.

You can try
a = [[1, 2], [1, 3], [2, 1], [2, 3], [3, 1], [3, 2]]
aa = [tuple(sorted(elem)) for elem in a]
set(aa)
output
{(1, 2), (1, 3), (2, 3)}

The 'Set' concept would come in handy here. The list you have (which contains duplicates) can be converted to a Set (which will never contain a duplicate).
Find more about Sets here : Set
Example :
l = ['foo', 'foo', 'bar', 'hello']
A set can be created directly:
s = set(l)
now if you check the contents of the list
print(s)
>>> {'foo', 'bar', 'hello'}
Set will work this way with any iterable object!
Hope it helps!

Related

Fastest way to find all elements that maximize / minimize a function in a Python list

Let's use a simple example: say I have a list of lists
ll = [[1, 2], [1, 3], [2, 3], [1, 2, 3], [2, 3, 4]]
and I want to find all longest lists, which means all lists that maximize the len function. Of course we can do
def func(x):
return len(x)
maxlen = func(max(ll, key=lambda x: func(x)))
res = [l for l in ll if func(l) == maxlen]
print(res)
Output
[[1, 2, 3], [2, 3, 4]]
But I wonder if there are more efficient way to do this, especially when the function is very expensive or the list is very long. Any suggestions?
From a computer science/algorithms perspective, this is a very classical "reduce" problem.
so, pseudocode. It's honestly very straightforward.
metric():= a mapping from elements to non-negative numbers
winner = []
maxmetric = 0
for element in ll:
if metric(element) larger than maxmetric:
winner = [ element ]
maxmetric = metric(element)
else if metric(element) equal to maxmetric:
append element to winner
when the function is very expensive
Note that you do compute func(x) for each element twice, first there
maxlen = func(max(ll, key=lambda x: func(x)))
then there
res = [l for l in ll if func(l) == maxlen]
so it would be beneficial to store what was already computed. functools.lru_cache allow that easily just replace
def func(x):
return len(x)
using
import functools
#functools.lru_cache(maxsize=None)
def func(x):
return len(x)
However, beware as due to way how data are stored argument(s) must be hashable, so in your example you would first need convert list e.g. to tuples i.e.
ll = [(1, 2), (1, 3), (2, 3), (1, 2, 3), (2, 3, 4)]
See descripiton in docs for further discussion
Is not OK use dictionary like below, (this is O(n))
ll = [[1, 2], [1, 3], [2, 3], [1, 2, 3], [2, 3, 4]]
from collections import defaultdict
dct = defaultdict(list)
for l in ll:
dct[len(l)].append(l)
dct[max(dct)]
Output:
[[1, 2, 3], [2, 3, 4]]
>>> dct
defaultdict(list, {2: [[1, 2], [1, 3], [2, 3]], 3: [[1, 2, 3], [2, 3, 4]]})
OR use setdefault and without defaultdict like below:
ll = [[1, 2], [1, 3], [2, 3], [1, 2, 3], [2, 3, 4]]
dct = {}
for l in ll:
dct.setdefault(len(l), []).append(l)
Output:
>>> dct
{2: [[1, 2], [1, 3], [2, 3]], 3: [[1, 2, 3], [2, 3, 4]]}

Python: remove duplicates in list of list ignoring list order

I have a list of lists like this
[[], [1, 2, 2], [1], [2], [2], [1, 2], [1, 2], [2, 1], [2, 2]]
I want to remove all duplicates, where the order does not matter, so in the list above I need to remove [2], [1,2], and [2,1].
I thought I can do this with Counter()
from collections import Counter
counter_list = []
no_dublicates = []
for sub_list in all_subsets:
counter_dic = Counter(sub_list)
if counter_dic in counter_list:
pass
else:
no_dublicates.append(list(sub_list))
counter_list.append(counter_dic)
which works fine... but it is the slowest part of my code. I was wondering whether there is a faster way to do this?
You can convert the Counter objects to frozensets, which are hashable and can be put in a set themselves for linear savings on the in check:
from collections import Counter
counters = set()
no_duplicates = []
for sub_list in all_subsets:
c = frozenset(Counter(sub_list).items())
if c not in counters:
counters.add(c)
no_duplicates.append(list(sub_list))
Doing this with a dict comprehension also looks cool:
no_duplicates = list(
{frozenset(Counter(l).items()): l for l in all_subsets}.values())
If you don't want to use the collections module, you can also try a simple solution like this:
lsts = [[], [1, 2, 2], [1], [2], [2], [1, 2], [1, 2], [2, 1], [2, 2]]
counts = {}
for sublist in lsts:
key = tuple(sorted(sublist))
counts[key] = counts.get(key, 0) + 1
result = []
for sublist in lsts:
key = tuple(sorted(sublist))
if counts[key] == 1:
result.append(sublist)
print(result)
Which outputs:
[[], [1, 2, 2], [1], [2, 2]]
Why you are using any external module and why making it too complex when you can do it in just few lines of code:
data_=[[], [1, 2, 2], [1], [2], [2], [1, 2], [1, 2], [2, 1], [2, 2]]
dta_dict={}
for j,i in enumerate(data_):
if tuple(sorted(i)) not in dta_dict:
dta_dict[tuple(sorted(i))]=[j]
else:
dta_dict[tuple(sorted(i))].append(j)
print(dta_dict.keys())
output:
dict_keys([(1, 2), (), (1,), (2, 2), (1, 2, 2), (2,)])
if you want list instead of tutple :
print(list(map(lambda x:list(x),dta_dict.keys())))
output:
[[1, 2], [], [1], [2, 2], [1, 2, 2], [2]]

Slicing list of lists in Python

I need to slice a list of lists:
A = [[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]]
idx = slice(0,4)
B = A[:][idx]
The code above isn't giving me the right output.
What I want is: [[1,2,3],[1,2,3],[1,2,3]]
Very rarely using slice objects is easier to read than employing a list comprehension, and this is not one of those cases.
>>> A = [[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]]
>>> [sublist[:3] for sublist in A]
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
This is very clear. For every sublist in A, give me the list of the first three elements.
With numpy it is very simple - you could just perform the slice:
In [1]: import numpy as np
In [2]: A = np.array([[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]])
In [3]: A[:,:3]
Out[3]:
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])
You could, of course, transform numpy.array back to the list:
In [4]: A[:,:3].tolist()
Out[4]: [[1, 2, 3], [1, 2, 3], [1, 2, 3]]
A = [[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]]
print [a[:3] for a in A]
Using list comprehension
you can use a list comprehension such as: [x[0:i] for x in A]
where i is 1,2,3 etc based on how many elements you need.
Either:
>>> [a[slice(0,3)] for a in A]
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
Or:
>>> [list(filter(lambda x: x<=3, a)) for a in A]
[[1, 2, 3], [1, 2, 3], [1, 2, 3]]
I am new in programming and Python is my First Language. it's only 4 to 5 days only to start learning. I just learned about List and slicing and looking for some example I found your problem and try to solve it Kindly appreciate if my code is correct.
Here is my code
A = [[1,2,3,4,5],[1,2,3,4,5],[1,2,3,4,5]]
print(A[0][0:3],A[1][0:3],A[1][0:3])

Ranking a list without breaking ties in Python

I need to attribute ranks to elements of a list while making sure tied elements get the same rank.
For instance:
data = [[1],[3],[2],[2]]
c = 0
for i in sorted(data, reverse=True):
i.append(c+1)
c += 1
print data
returns:
[[1, 4], [3, 1], [2, 2], [2, 3]]
Where a rank is appended to the score.
What would I need to change to this simple code to instead obtain:
[[1, 3], [3, 1], [2, 2], [2, 2]]
Where elements scoring 2 are tied and both obtain the second place, while 1, the previously fourth place, is promoted to third place?
Using itertools.groupby, enumerate:
>>> from itertools import groupby
>>> data = [[1],[3],[2],[2]]
>>> sorted_data = sorted(data, reverse=True)
>>> for rank, (_, grp) in enumerate(groupby(sorted_data, key=lambda xs: xs[0]), 1):
... for x in grp:
... x.append(rank)
...
>>> print data
[[1, 3], [3, 1], [2, 2], [2, 2]]

Remove duplicated lists in list of lists in Python

I've seen some questions here very related but their answer doesn't work for me. I have a list of lists where some sublists are repeated but their elements may be disordered. For example
g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]
The output should be, naturally according to my question:
g = [[1,2,3],[9,0,1],[4,3,2]]
I've tried with set but only removes those lists that are equal (I thought It should work because sets are by definition without order). Other questions i had visited only has examples with lists exactly duplicated or repeated like this: Python : How to remove duplicate lists in a list of list?. For now order of output (for list and sublists) is not a problem.
(ab)using side-effects version of a list comp:
seen = set()
[x for x in g if frozenset(x) not in seen and not seen.add(frozenset(x))]
Out[4]: [[1, 2, 3], [9, 0, 1], [4, 3, 2]]
For those (unlike myself) who don't like using side-effects in this manner:
res = []
seen = set()
for x in g:
x_set = frozenset(x)
if x_set not in seen:
res.append(x)
seen.add(x_set)
The reason that you add frozensets to the set is that you can only add hashable objects to a set, and vanilla sets are not hashable.
If you don't care about the order for lists and sublists (and all items in sublists are unique):
result = set(map(frozenset, g))
If a sublist may have duplicates e.g., [1, 2, 1, 3] then you could use tuple(sorted(sublist)) instead of frozenset(sublist) that removes duplicates from a sublist.
If you want to preserve the order of sublists:
def del_dups(seq, key=frozenset):
seen = {}
pos = 0
for item in seq:
if key(item) not in seen:
seen[key(item)] = True
seq[pos] = item
pos += 1
del seq[pos:]
Example:
del_dups(g, key=lambda x: tuple(sorted(x)))
See In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?
What about using mentioned by roippi frozenset this way:
>>> g = [list(x) for x in set(frozenset(i) for i in [set(i) for i in g])]
[[0, 9, 1], [1, 2, 3], [2, 3, 4]]
I would convert each element in the list to a frozenset (which is hashable), then create a set out of it to remove duplicates:
>>> g = [[1, 2, 3], [3, 2, 1], [1, 3, 2], [9, 0, 1], [4, 3, 2]]
>>> set(map(frozenset, g))
set([frozenset([0, 9, 1]), frozenset([1, 2, 3]), frozenset([2, 3, 4])])
If you need to convert the elements back to lists:
>>> map(list, set(map(frozenset, g)))
[[0, 9, 1], [1, 2, 3], [2, 3, 4]]

Categories

Resources