Remove duplicates from a nested list

Remove duplicates from a nested list - python

I have a list
A = [['1'],['1','2'],['1','2','3','1','2'],['3','3','3']]
and I want to make my list to
A = [['1'],['1','2'],['1','2','3'],['3']]
ie I want to remove duplicate elements within the elements in a list ..

One-liner (If order doesn't matter) :
A = [['1'],['1','2'],['1','2','3','1','2'],['3','3','3']]
A = [list(set(a)) for a in A]
print(A) # => [['1'], ['2', '1'], ['3', '2', '1'], ['3']]
One-liner (If order matters) :
A = [['1'],['1','2'],['1','2','3','1','2'],['3','3','3']]
A = [sorted(set(a), key=a.index) for a in A]
print(A) # => [['1'], ['1', '2'], ['1', '2', '3'], ['3']]

A functional version, with functools:
>>> import functools
>>> A = [['1'],['1','2'],['1','2','3','1','2'],['3','3','3']]
>>> print ([functools.reduce(lambda result,x:result if x in result else result+[x], xs, []) for xs in A])
[['1'], ['1', '2'], ['1', '2', '3'], ['3']]
The lambda function adds an element to the result list only if that element is not present in the list. Not very efficient, but keeps the order of elements.
Also note that with Python 2, you don't need to import functools: reduce is a builtin function.

You can use a generator:
def remove_dups(l):
for a in l:
new_l = []
for b in a:
if b not in new_l:
new_l.append(b)
yield new_l
A = [['1'],['1','2'],['1','2','3','1','2'],['3','3','3']]
print(list(remove_dups(A)))
Output:
[['1'], ['1', '2'], ['1', '2', '3'], ['3']]

Related

Split list in python when same values occurs into a list of sublists

Using python, I need to split my_list = ['1','2','2','3','3','3','4','4','5'] into a list with sublists that avoid the same value. Correct output = [['1','2','3','4','5'],['2','3','4'],['3']]

Probably not the most efficient approach but effective nonetheless:
my_list = ['1','2','2','3','3','3','4','4','5']
output = []
for e in my_list:
for f in output:
if not e in f:
f.append(e)
break
else:
output.append([e])
print(output)
Output:
[['1', '2', '3', '4', '5'], ['2', '3', '4'], ['3']]

I assumed you are indexing every unique element with its occurrence and also sorted the result list to better suit your desired output.
uniques = list(set(my_list))
uniques.sort()
unique_counts = {unique:my_list.count(unique) for unique in uniques}
new_list = []
for _ in range(max(unique_counts.values())):
new_list.append([])
for unique,count in unique_counts.items():
for i in range(count):
new_list[i].append(unique)
The output for new_list is
[['1', '2', '3', '4', '5'], ['2', '3', '4'], ['3']]

By using collections.Counter for recognizing the maximum number of the needed sublists and then distributing consecutive unique keys on sublists according to their frequencies:
from collections import Counter
my_list = ['1','2','2','3','3','3','4','4','5']
cnts = Counter(my_list)
res = [[] for i in range(cnts.most_common(1).pop()[1])]
for k in cnts.keys():
for j in range(cnts[k]):
res[j].append(k)
print(res)
[['1', '2', '3', '4', '5'], ['2', '3', '4'], ['3']]

Here's a way to do it based on getting unique values and counts using list comprehension.
my_list = ['1','2','2','3','3','3','4','4','5']
unique = [val for i,val in enumerate(my_list) if val not in my_list[0:i]]
counts = [my_list.count(val) for val in unique]
output = [[val for val,ct in zip(unique, counts) if ct > i] for i in range(max(counts))]

Remove sublist duplicates including reversed

For example i have following
list = [['1', '2'], ['1', '3'], ['1', '4'], ['1', '5'], ['2', '1'], ['4', '1'], ['2', '6']]
I want to match if a sub list has a reversed sub list within same list (i.e. ['1', '2'] = ['2', '1']) , and if True than to remove from the list the mirrored one.
The final list should look like :
list = [['1', '2'], ['1', '3'], ['1', '4'], ['1', '5']['2', '6']]
This is what i tried:
for i in range(len(list)):
if list[i] == list[i][::-1]:
print("Match found")
del list[i][::-1]
print(list)
But finally I get the same list as original. I am not sure if my matching condition is correct.

You could iterate over the elements of the list, and use a set to keep track of those that have been seen so far. Using a set is a more convenient way to check for membership, since the operation has a lower complexity, and in that case you'll need to work with tuples, since lists aren't hashable. Then just keep those items if neither the actual tuple or the reversed have been seen (if you just want to ignore those which have a reversed you just need if tuple(reversed(t)) in s):
s = set()
out = []
for i in l:
t = tuple(i)
if t in s or tuple(reversed(t)) in s:
continue
s.add(t)
out.append(i)
print(out)
# [['1', '2'], ['1', '3'], ['1', '4'], ['1', '5'], ['2', '6']]

lists = [['1', '2'], ['1', '3'], ['1', '4'], ['1', '5'], ['2', '1'], ['4', '1'], ['2', '6']]
for x in lists:
z=x[::-1]
if z in lists:
lists.remove(z)
Explanation: While looping over lists, reverse each element and store in 'z'. Now, if 'z' exists in lists, remove it using remove()
The problem with your solution is you are checking while using index 'i' which means if an element at 'i' is equal to its reverse which can never happen!! hence getting the same results

Approach1:
new_list = []
for l in List:
if l not in new_list and sorted(l) not in new_list:
new_list.append(l)
print(new_list)
Approach2:
You can try like this also:
seen = set()
print([x for x in List if frozenset(x) not in seen and not seen.add(frozenset(x))])
[['1', '2'], ['1', '3'], ['1', '4'], ['1', '5'], ['2', '6']]

my_list = [['1', '2'], ['1', '3'], ['1', '4'], ['1', '5'], ['2', '1'], ['4', '1'], ['2', '6']]
my_list = list(set([sorted(l) for l in my_list]))

This is similar to solution by #Mehul Gupta, but I think their solution is traversing the list twice if matched: one for checking and one for removing. Instead, we could
the_list = [['1', '2'], ['1', '3'], ['1', '4'], ['1', '5'], ['2', '1'], ['4', '1'], ['2', '6']]
for sub_list in the_list:
try:
idx = the_list.index(sub_list[::-1])
except ValueError:
continue
else:
the_list.pop(idx)
print(the_list)
# [['1', '2'], ['1', '3'], ['1', '4'], ['1', '5'], ['2', '6']]
because it is easier to ask for forgiveness than permission.
Note: Removing elements whilst looping is not a good thing but for this specific problem, it does no harm. In fact, it is better because we do not check the mirrored again; we already removed it.

As I have written in a comment, do never use list (or any built-in) as a variable name:
L = [['1', '2'], ['1', '3'], ['1', '4'], ['1', '5'], ['2', '1'], ['4', '1'], ['2', '6']]
Have a look at your code:
for i in range(len(L)):
if L[i] == L[i][::-1]:
print("Match found")
del L[i][::-1]
There are two issues. First, you compare L[i] with L[i][::-1], but you want to compare L[i] with L[j][::-1] for any j != i. Second, you try to delete elements of a list during an iteration. If you delete an element, then the list length is decreased and the index of the loop will be out of the bounds of list:
>>> L = [1,2,3]
>>> for i in range(len(L)):
... del L[i]
...
Traceback (most recent call last):
...
IndexError: list assignment index out of range
To fix the first issue, you can iterate twice over the elements: for each element, is there another element that is the reverse of the first? To fix the second issue, you have two options: 1. build a new list; 2. proceed in reverse order, to delete first the last indices.
First version:
new_L = []
for i in range(len(L)):
for j in range(i+1, len(L)):
if L[i] == L[j][::-1]:
print("Match found")
break
else: # no break
new_L.append(L[i])
print(new_L)
Second version:
for i in range(len(L)-1, -1, -1):
for j in range(0, i):
if L[i] == L[j][::-1]:
print("Match found")
del L[i]
print(L)
(For a better time complexity, see #yatu's answer.)
For a one-liner, you can use the functools module:
>>> L = [['1', '2'], ['1', '3'], ['1', '4'], ['1', '5'], ['2', '1'], ['4', '1'], ['2', '6']]
>>> import functools
>>> functools.reduce(lambda acc, x: acc if x[::-1] in acc else acc + [x], L, [])
[['1', '2'], ['1', '3'], ['1', '4'], ['1', '5'], ['2', '6']]
The logic is the same as the logic of the first version.

You can try this also:-
l = [['1', '2'], ['1', '3'], ['1', '4'], ['1', '5'], ['2', '1'], ['4', '1'], ['2', '6']]
res = []
for sub_list in l:
if sub_list[::-1] not in res:
res.append(sub_list)
print(res)
Output:-
[['1', '2'], ['1', '3'], ['1', '4'], ['1', '5'], ['2', '6']]

'IndexError: list index out of range' during assignment

j = [['4', '5'], ['1', '1'], ['1', '5'], ['3', '4'], ['3', '1']]
k = [['5', '2'], ['4', '2'], ['2', '4'], ['3', '3'], ['4', '3']]
t = []
indexPointer = 0
for coord in j:
for number in coord:
t[indexPointer][0] = number
indexPointer += 1
indexPointer = 0
for coord in k:
for number in coord:
t[indexPointer][1] = number
indexPointer += 1
print(t)
should output:
[[4,5],[5,2],[1,4],[1,2],[1,2],[5,4],[3,3],[4,3],[3,4],[1,3]]
instead i get:
t[indexPointer][0] = number
IndexError: list index out of range
How can I solve this? I've tried to find a way but without any luck.
Edit:
I didn't include all the code necessary. It has been updated.

You can't index into an empty list, since there's nothing there. You'll either have to append things to it, or prefill it with empty values, eg:
t = [None] * 10
But even this won't exactly work, since you expect t to be two dimensional. You may want to try making t a defaultdict, like so:
from collection import defaultdict
t = defaultdict(dict)
t[1][0] = 'a'

Why wouldn't ths be out of range?
Your t variable is just an empty single dimensional list which you're trying to access as if it were 2-dimensional.
I think your just trying to add everything that is in j to t? In which case you could just do something like this: t = list(itertools.chain(*j))
Edit: Just noticed each element is in it's own list: t = [[x] for x in itertools.chain(*j)]

I would recommend the code posted by Pythonista, but to adjust your code to make it work:
j = [['4', '5'], ['1', '1'], ['1', '5'], ['3', '4'], ['3', '1']]
t = []
for coord in j:
for number in coord:
t.append([number])
print(t)
#[['4'], ['5'], ['1'], ['1'], ['1'], ['5'], ['3'], ['4'], ['3'], ['1']]
As you're looping through each element the .append list method is tacking on [number] to the end of your list t.
You can accomplish this same nested loop code using a list comprehension:
t = [[number] for coord in j for number in coord]
Update:
Since you've updated your question:
You should consider the zip function in this situation.
t=list(itertools.chain(*zip(j,k)))
To update the code from above if you want to use a for loop, but here you can use the list .extend method:
j = [['4', '5'], ['1', '1'], ['1', '5'], ['3', '4'], ['3', '1']]
k = [['5', '2'], ['4', '2'], ['2', '4'], ['3', '3'], ['4', '3']]
t = []
for coord in zip(j,k):
t.extend(coord)
print(t)
#[['4'], ['5'], ['1'], ['1'], ['1'], ['5'], ['3'], ['4'], ['3'], ['1']]
And as a comprehension:
t=[i for coord in zip(j,k) for i in coord]

How can i sort the list with keys in python

I have the two list dictionary like this
obj1 = [mydict['obj1'],mydict['obj2'],mydict['obj3'],mydict['obj4']]
obj2 = [mydict['obj1'],mydict['obj2'],mydict['obj3'],mydict['obj4'], mydict['obj5'] ]
Now i want that
Count the number of elements in each list
Then based on whichever is greater then get that list of objects
I want a single list which conatins the above two list of(list of) dictionaries based on the higher number of elements so that i cause something like this
mylist = myfunc(objects1, objects2 )
mylist should be a list like [objects1, objects2] depending upon who has greater number of objects.
what is the best way to do that with less lines of code
Something like EDIT
mylist = sorted([obj1, obj2], key=lambda a: len(a), reverse=True)

There's no need to use a lambda function if it's just going to call a function anyway.
>>> objects1 = [1, 2, 3]
>>> objects2 = ['1', '2', '3', '4']
>>>
>>> mylist = [objects1, objects2]
>>> max(mylist, key=len)
['1', '2', '3', '4']
>>> sorted(mylist, key=len, reverse=True)
[['1', '2', '3', '4'], [1, 2, 3]]

objects1 = [1, 2, 3]
objects2 = ['1', '2', '3', '4']
mylist = [objects1, objects2]
mylist.sort(key=len, reverse=True)
print mylist
[['1', '2', '3', '4'], [1, 2, 3]]

how to create a sub list for a specific string in a nested list

I have Python nested list that I'm trying to organize and eventually count number of occurrences. The nested list looks like:
[['22', '1'], ['21', '15'], ['11', '3'], ['31', '4'], ['41', '13'],...]
The first I want to do is create a sublist that only contains '1' corresponding to the second item in the nested list. I was able to do this by the following command:
Subbasin_1 = []
Subbasin_1.append([x for x in Subbasins_Imp if x[1] == '1'])
print Subbasin_1
Giving these results, which are correct:
[['21', '1'], ['21', '1'], ['21', '1'], ['21', '1'], ['22', '1'],...]
Now I want to create another sublist that will give me all the '21' in the each nested list for Subbasin_1. When I use the same line of script, but change the appropriate items, I get an empty list. Not sure what is going on...?
OS_Count1 = []
OS_Count1.append([x for x in Subbasin_1 if x[0] == '21'])
print OS_Count1
Result is [[]] ??? What's the difference between the two?
Thanks for any help...

I don't believe that your
[['21', '1'], ['21', '1'], ['21', '1'], ['21', '1'], ['22', '1'],...]
line could be produced by the code you gave. Your Subbasin_1.append line appends a list to the empty list Subbasin_1, so you should get something like
[[['22', '1'], ['21', '1']]]
with one extra level of nesting.
If you avoid the unnecessary construction of an empty list + append, you should get what you want:
>>> Subbasins_Imp = [['22', '1'], ['21', '15'], ['11', '3'], ['31', '4'], ['41', '13'], ['21', '1']]
>>>
>>> Subbasin_1 = [x for x in Subbasins_Imp if x[1] == '1']
>>> print Subbasin_1
[['22', '1'], ['21', '1']]
>>> OS_Count1 = [x for x in Subbasin_1 if x[0] == '21']
>>> print OS_Count1
[['21', '1']]
Alternatively, you could simply replace append by extend. I don't recommend this, but it might help you to see what's happening:
>>> Subbasins_Imp = [['22', '1'], ['21', '15'], ['11', '3'], ['31', '4'], ['41', '13'], ['21', '1']]
>>>
>>> Subbasin_1 = []
>>> Subbasin_1.extend([x for x in Subbasins_Imp if x[1] == '1'])
>>> print Subbasin_1
[['22', '1'], ['21', '1']]
>>>
>>> OS_Count1 = []
>>> OS_Count1.extend([x for x in Subbasin_1 if x[0] == '21'])
>>> print OS_Count1
[['21', '1']]

Your list comprehension [x for x in Subbasins_Imp if x[1] == '1'] creates a list by itself, which means when you append that list to Subbasin_1, you end up with a doubly nested list.
Compare:
sub_imp = [['22', '1'], ['21', '15'], ['11', '3'], ['31', '4'], ['41', '13']]
sub_1 = [x for x in sub_imp if x[1] == '1']
sub_2 = []
sub_2.append([x for x in sub_imp if x[1] == '1'])
print(sub_1)
print(sub_2)

Running your code I obtained a triple nested list....
Sub = [[['21','1'],....]]
Instead of doing:
Subbasin_1 = []
Subbasin_1.append([x for x in Sub if x[1]=='1'])
Simple do the list comprehension :
Subbasin_1 = [x for x in Sub if x[1] == '1']
This will give you the result you are expecting.

There is no difference which implies Subbasin_1 might be empty at the time of the call or doesn't contain the data you think it does. It might also be that Subbasin_1 is nested 3 layers deep, not 2.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Remove duplicates from a nested list - python

I have a list A = [['1'],['1','2'],['1','2','3','1','2'],['3','3','3']] and I want to make my list to A = [['1'],['1','2'],['1','2','3'],['3']] ie I want to remove duplicate elements within the elements in a list ..

You can use a generator: def remove_dups(l): for a in l: new_l = [] for b in a: if b not in new_l: new_l.append(b) yield new_l A = [['1'],['1','2'],['1','2','3','1','2'],['3','3','3']] print(list(remove_dups(A))) Output: [['1'], ['1', '2'], ['1', '2', '3'], ['3']]

Related

Split list in python when same values occurs into a list of sublists

Remove sublist duplicates including reversed

'IndexError: list index out of range' during assignment

How can i sort the list with keys in python

how to create a sub list for a specific string in a nested list

Categories

Resources