Python: Marking duplicates in list

Python: Marking duplicates in list - python

I have an unordered python list, what I want is to create 2nd list which will tell either the values in the first list are duplicate or unique.
For duplicates i have to mark them as duplicate1, duplicate2 and so on.
I will create a dictionary from these lists and later on these will be use in pandas dataframe.
I am stuck on logic of 2nd_list, could someone please help.
first_List = ['a', 'b', 'c', 'a', 'd', 'c']
EXPECTED OUTPUT:
second_List = ['dup1', 'unique', 'dup1', 'dup2', 'unique', 'dup2']

You can iterate the list by index, and for list value at given index, check if it is duplicate or not (isDuplicate) boolean is created in the code below, if it is a duplicate entry, then count how many times the current value appeared in the list for the given index and append the string to the second_List
second_List = []
for i in range(len(first_List)):
isDuplicate = first_List.count(first_List[i]) > 1
if isDuplicate:
count = first_List[:i+1].count(first_List[i])
second_List.append(f'dup{count}')
else:
second_List.append('unique')
OUTPUT:
['dup1', 'unique', 'dup1', 'dup2', 'unique', 'dup2']
Here is the equivalent List-Comprehension as well, if you are interested!
>>> [f'dup{first_List[:i+1].count(first_List[i])}'
... if first_List.count(first_List[i]) > 1
... else 'unique'
... for i in range(len(first_List))]
['dup1', 'unique', 'dup1', 'dup2', 'unique', 'dup2']

To be short
first_List = ['a', 'b', 'c', 'a', 'd', 'c']
d = {i:'' for i in first_List if first_List.count(i) > 1}
second_List = ['unique' if i not in d.keys() else f'dup{list(d.keys()).index(i)+1}' for i in first_List]
It works fine.
This is the same as
d = {i:'' for i in first_List if first_List.count(i) > 1}
second_List = list()
for i in first_List:
text = 'unique' if i not in d.keys() else f'dup{list(d.keys()).index(i)+1}'
second_List.append(text)

Related

Would like to prevent dupes in a python list of lists

I am creating a list of lists and want to prevent dupes. For example, I have:
mainlist = [[a,b],[c,d],[a,d]]
the next item (list) to be added is [b,a] which is considered a duplicate of [a,b].
UPDATE
mainlist = [[a,b],[c,d],[a,d]]
swap = [b,a]
for item in mainlist:
if set(item) & set(swap):
print "match was found", item
else:
mainlist.append(swap)
Any suggestions as to how I can test whether the next item to be added is already in the list?

Here's an approach using frozensets within a set to check for duplicates. It's a bit ugly since I'm invoking a function that works with global variables.
def add_to_mainlist(new_list):
if frozenset(new_list) not in dups:
mainlist.append(new_list)
mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
dups = set()
for l in mainlist:
dups.add(frozenset(l))
print("Before:", mainlist)
add_to_mainlist(['a', 'b'])
print("After:", mainlist)
This outputs:
Before: [['a', 'b'], ['c', 'd'], ['a', 'd']]
After: [['a', 'b'], ['c', 'd'], ['a', 'd']]
Showing that the new list was indeed not added to the original.
Here's a cleaner version that calculates the existing set on the fly inside a function that does everything locally:
def add_to_mainlist(mainlist, new_list):
dups = set()
for l in mainlist:
dups.add(frozenset(l))
if frozenset(new_list) not in dups:
mainlist.append(new_list)
return mainlist
mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
print("Before:", mainlist)
mainlist = add_to_mainlist(mainlist, ['a', 'b']) # the assignment isn't needed, but done anyway :-)
print("After:", mainlist)
Why doesn't your existing code work?
This is what you're doing:
...
for item in mainlist:
if set(item) & set(swap):
print "match was found", item
else:
mainlist.append(swap)
You're intersecting two sets and checking the truthiness of the result. While this might be okay for 0 intersections, in the event that even one of the elements are common (example, ['a', 'b'] and ['b', 'd']), you'd still declare a match which is false.
Ideally you'd want to check the length of the resultant set and make sure its length is equal to than 2:
dups = False
for item in mainlist:
if len(set(item) & set(swap)) == 2:
dups = True
break
if dups == False:
mainlist.append(swap)
You'd also ideally want a flag to ensure that you did not find duplicates. Your previous code would add without checking all items first.

If the order of your inner lists doesn't matter, then this can trivially be accomplished using frozenset()s:
>>> mainlist = [['a', 'b'],['c', 'd'],['a', 'd']]
>>> mainlist = [frozenset(sublist) for sublist in mainlist]
>>>
>>> def add_to_list(lst, sublist):
... if frozenset(sublist) not in lst:
... lst.append(frozenset(sublist))
...
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[frozenset({'a', 'b'}), frozenset({'d', 'c'}), frozenset({'a', 'd'})]
>>>
If the order does matter you can either do what #Coldspeed suggested - Construct a set() from your list, construct a frozenset() from the list to be added, and test for membership - or you can use all() and sorted() to test if the list to be added is equivalent to any of the other lists:
>>> def add_to_list(lst, sublist):
... for l in lst:
... if all(a == b for a, b in zip(sorted(sublist), sorted(l))):
... return
... lst.append(sublist)
...
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>> add_to_list(mainlist, ['b', 'a'])
>>> mainlist
[['a', 'b'], ['c', 'd'], ['a', 'd']]
>>>

How to combine two elements of a list based on a given condition

I want to combine two elements in a list based on a given condition.
For example if I encounter the character 'a' in a list, I would like to combine it with the next element. The list:
['a', 'b', 'c', 'a', 'd']
becomes
['ab', 'c', 'ad']
Is there any quick way to do this?
One solution I have thought of is to create a new empty list and iterate through the first list. As we encounter the element 'a' in list 1, we join list1[index of a] and list1[index of a + 1] and append the result to list 2. However I wanted to know if there is any way to do it without creating a new list and copying values into it.

This does not create a new list, just modifies the existing one.
l = ['a', 'b', 'c', 'a', 'd']
for i in range(len(l)-2, -1, -1):
if l[i] == 'a':
l[i] = l[i] + l.pop(i+1)
print(l)

If you don't want to use list comprehension to create a new list (maybe because your input list is huge) you could modify the list in-place:
i=0
while i < len(l):
if l[i]=='a':
l[i] += l.pop(i+1)
i += 1

Use a list comprehension with an iterator on your list. When the current iteratee is a simply join it with the next item from the iterator using next:
l = ['a', 'b', 'c', 'a', 'd']
it = iter(l)
l[:] = [i+next(it) if i == 'a' else i for i in it]
print l
# ['ab', 'c', 'ad']

Well, if you don't want to create a new list so much, here we go:
from itertools import islice
a = list("abcdabdbac")
i = 0
for x, y in zip(a, islice(a, 1, None)):
if x == 'a':
a[i] = x + y
i += 1
elif y != 'a':
a[i] = y
i += 1
try:
del a[i:]
except:
pass

you could use itertools.groupby and group by:
letter follows a or
letter is not a
using enumerate to generate the current index, which allows to fetch the previous element from the list (creating a new list but one-liner)
import itertools
l = ['a', 'b', 'c', 'a', 'd']
print(["".join(x[1] for x in v) for _,v in itertools.groupby(enumerate(l),key=lambda t: (t[0] > 0 and l[t[0]-1]=='a') or t[1]=='a')])
result:
['ab', 'c', 'ad']

This is easy way. Mb not pythonic way.
l1 = ['a', 'b', 'c', 'a', 'd']
do_combine = False
combine_element = None
for el in l1:
if do_combine:
indx = l1.index(el)
l1[indx] = combine_element + el
do_combine = False
l1.remove(combine_element)
if el == 'a':
combine_element = el
do_combine = True
print(l1)
# ['ab', 'c', 'ad']

Including first and last elements in list comprehension

I would like to keep the first and last elements of a list, and exclude others that meet defined criteria without using a loop. The first and last elements may or may not have the criteria of elements being removed.
As a very basic example,
aList = ['a','b','a','b','a']
[x for x in aList if x !='a']
returns ['b', 'b']
I need ['a','b','b','a']
I can split off the first and last values and then re-concatenate them together, but this doesn't seem very Pythonic.

You can use slice assignment:
>>> aList = ['a','b','a','b','a']
>>> aList[1:-1]=[x for x in aList[1:-1] if x !='a']
>>> aList
['a', 'b', 'b', 'a']

Yup, it looks like dawg’s and jez’s suggested answers are the right ones, here. Leaving the below for posterity.
Hmmm, your sample input and output don’t match what I think your question is, and it is absolutely pythonic to use slicing:
a_list = ['a','b','a','b','a']
# a_list = a_list[1:-1] # take everything but the first and last elements
a_list = a_list[:2] + a_list[-2:] # this gets you the [ 'a', 'b', 'b', 'a' ]

Here's a list comprehension that explicitly makes the first and last elements immune from removal, regardless of their value:
>>> aList = ['a', 'b', 'a', 'b', 'a']
>>> [ letter for index, letter in enumerate(aList) if letter != 'a' or index in [0, len(x)-1] ]
['a', 'b', 'b', 'a']

Try this:
>>> list_ = ['a', 'b', 'a', 'b', 'a']
>>> [value for index, value in enumerate(list_) if index in {0, len(list_)-1} or value == 'b']
['a', 'b', 'b', 'a']
Although, the list comprehension is becoming unwieldy. Consider writing a generator like so:
>>> def keep_bookends_and_bs(list_):
... for index, value in enumerate(list_):
... if index in {0, len(list_)-1}:
... yield value
... elif value == 'b':
... yield value
...
>>> list(keep_bookends_and_bs(list_))
['a', 'b', 'b', 'a']

Recursively replacing items in a list with items of a different list

I've implemented the exact same functionality with recursions, I also want a version without recursion as Python has a recursion limit and there are problems while sharing data.
sublist2 = [{'nothing': "Notice This"}]
sublist1 = [{'include':[sublist2]}]
mainlist = [{'nothing': 1}, {'include':[sublist1, sublist2]},
{'nothing': 2}, {'include':[sublist2]}]
What is to be filled in the Todo?
for i in mainlist:
if 'nothing' in i:
# Do nothing
else if 'include' in i:
# Todo
# Append the contents of the list mentioned recursively
# in it's own place without disrupting the flow
After the operation the expected result
mainlist = [{'nothing': 1},
{'nothing': "Notice This"}, {'nothing': "Notice This"},
{'nothing':2},
{'nothing': "Notice This"}]
If you notice sublist1 references to sublist2. That's the reason
{'include':[sublist1, sublist2]} is replaced by
{'nothing':"Notice This"}, {'nothing':"Notice This"}
I've tried the following
Inserting values into specific locations in a list in Python
How to get item's position in a list?

Instead of using recursion, you just look at the nth element and change it in place until it doesn't need any further processing.
sublist2 = [{'nothing': "Notice This"}]
sublist1 = [{'include':[sublist2]}]
mainlist = [{'nothing': 1}, {'include':[sublist1, sublist2]},
{'nothing': 2}, {'include':[sublist2]}]
index = 0
while index < len(mainlist):
if 'nothing' in mainlist[index]:
index += 1
elif 'include' in mainlist[index]:
# replace the 'include' entries with their corresponding list
mainlist[index:index+1] = mainlist[index]['include']
elif isinstance(mainlist[index], list):
# if an entry is a list, replace it with its entries
mainlist[index:index+1] = mainlist[index]
Note the difference between assigning to an entry l[0] and assigning to a slice l[0:1]
>>> l = [1, 2, 3, 4]
>>> l[3] = ['a', 'b', 'c']
>>> l
[1, 2, 3, ['a', 'b', 'c']]
>>> l[0:1] = ['x', 'y', 'z']
>>> l
>>> ['x', 'y', 'z', 2, 3, ['a', 'b', 'c']]

Python: if element in one list, change element in other?

I have two lists (of different lengths). One changes throughout the program (list1), the other (longer) doesn't (list2). Basically I have a function that is supposed to compare the elements in both lists, and if an element in list1 is in list2, that element in a copy of list2 is changed to 'A', and all other elements in the copy are changed to 'B'. I can get it to work when there is only one element in list1. But for some reason if the list is longer, all the elements in list2 turn to B....
def newList(list1,list2):
newList= list2[:]
for i in range(len(list2)):
for element in list1:
if element==newList[i]:
newList[i]='A'
else:
newList[i]='B'
return newList

Try this:
newlist = ['A' if x in list1 else 'B' for x in list2]
Works for the following example, I hope I understood you correctly? If a value in B exists in A, insert 'A' otherwise insert 'B' into a new list?
>>> a = [1,2,3,4,5]
>>> b = [1,3,4,6]
>>> ['A' if x in a else 'B' for x in b]
['A', 'A', 'A', 'B']

It could be because instead of
newList: list2[:]
you should have
newList = list2[:]
Personally, I prefer the following syntax, which I find to be more explicit:
import copy
newList = copy.copy(list2) # or copy.deepcopy
Now, I'd imagine part of the problem here is also that you use the same name, newList, for both your function and a local variable. That's not so good.
def newList(changing_list, static_list):
temporary_list = static_list[:]
for index, content in enumerate(temporary_list):
if content in changing_list:
temporary_list[index] = 'A'
else:
temporary_list[index] = 'B'
return temporary_list
Note here that you have not made it clear what to do when there are multiple entries in list1 and list2 that match. My code marks all of the matching ones 'A'. Example:
>>> a = [1, 2, 3]
>>> b = [3,4,7,2,6,8,9,1]
>>> newList(a,b)
['A', 'B', 'B', 'A', 'B', 'B', 'B', 'A']

I think this is what you want to do and can put newLis = list2[:] instead of the below but prefer to use list in these cases:
def newList1(list1,list2):
newLis = list(list2)
for i in range(len(list2)):
if newLis[i] in list1:
newLis[i]='A'
else: newLis[i]='B'
return newLis
The answer when passed
newList1(range(5),range(10))
is:
['A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B']

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Marking duplicates in list - python

Related

Would like to prevent dupes in a python list of lists

How to combine two elements of a list based on a given condition

Including first and last elements in list comprehension

Recursively replacing items in a list with items of a different list

Python: if element in one list, change element in other?

Categories

Resources