Removing duplicate element from a list and the element itself

Removing duplicate element from a list and the element itself - python

I know this question has been asked lots of times, but I am not asking how to remove duplicate elements from a list only, I want to remove the duplicated element as well.
For example, if I have a list:
x = [1, 2, 5, 3, 4, 1, 5]
I want the list to be:
x = [2, 3, 4] # removed 1 and 5 since they were repeated
I can't use a set, since that will include 1 and 5.
Should I use a Counter? Is there a better way?

This should be done with a Counter object. It's trivial.
from collections import Counter
x = [k for k, v in Counter([1, 2, 5, 3, 4, 1, 5]).iteritems() if v == 1]
print x
Output:
[2, 3, 4]

Maybe this way:
[_ for _ in x if x.count(_) == 1]
EDIT: This is not the best way in term of time complexity as you can see in the comment above, sorry my mistake.

Something more verbose and O(n):
x = [1, 2, 2, 3, 4]
def counts_fold(acc, x):
acc[x] = acc[x]+1 if x in acc else 1
return acc
counts = reduce(counts_fold, x, {})
y = [i for i in x if counts[i] == 1]
print y

How about
duplicates = set(x)
x = [elem for elem in x if elem not in duplicates]
This has the advantage of being O(n) instead of O(n^2).
Edit. Indeed my bad, I must have been half asleep. Mahmoud's answer above is the correct one.

Related

Identifying certain elements which are not in one list in Python [duplicate]

I want to take the difference between lists x and y:
>>> x = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> y = [1, 3, 5, 7, 9]
>>> x - y
# should return [0, 2, 4, 6, 8]

Use a list comprehension to compute the difference while maintaining the original order from x:
[item for item in x if item not in y]
If you don't need list properties (e.g. ordering), use a set difference, as the other answers suggest:
list(set(x) - set(y))
To allow x - y infix syntax, override __sub__ on a class inheriting from list:
class MyList(list):
def __init__(self, *args):
super(MyList, self).__init__(args)
def __sub__(self, other):
return self.__class__(*[item for item in self if item not in other])
Usage:
x = MyList(1, 2, 3, 4)
y = MyList(2, 5, 2)
z = x - y

Use set difference
>>> z = list(set(x) - set(y))
>>> z
[0, 8, 2, 4, 6]
Or you might just have x and y be sets so you don't have to do any conversions.

if duplicate and ordering items are problem :
[i for i in a if not i in b or b.remove(i)]
a = [1,2,3,3,3,3,4]
b = [1,3]
result: [2, 3, 3, 3, 4]

That is a "set subtraction" operation. Use the set data structure for that.
In Python 2.7:
x = {1,2,3,4,5,6,7,8,9,0}
y = {1,3,5,7,9}
print x - y
Output:
>>> print x - y
set([0, 8, 2, 4, 6])

For many use cases, the answer you want is:
ys = set(y)
[item for item in x if item not in ys]
This is a hybrid between aaronasterling's answer and quantumSoup's answer.
aaronasterling's version does len(y) item comparisons for each element in x, so it takes quadratic time. quantumSoup's version uses sets, so it does a single constant-time set lookup for each element in x—but, because it converts both x and y into sets, it loses the order of your elements.
By converting only y into a set, and iterating x in order, you get the best of both worlds—linear time, and order preservation.*
However, this still has a problem from quantumSoup's version: It requires your elements to be hashable. That's pretty much built into the nature of sets.** If you're trying to, e.g., subtract a list of dicts from another list of dicts, but the list to subtract is large, what do you do?
If you can decorate your values in some way that they're hashable, that solves the problem. For example, with a flat dictionary whose values are themselves hashable:
ys = {tuple(item.items()) for item in y}
[item for item in x if tuple(item.items()) not in ys]
If your types are a bit more complicated (e.g., often you're dealing with JSON-compatible values, which are hashable, or lists or dicts whose values are recursively the same type), you can still use this solution. But some types just can't be converted into anything hashable.
If your items aren't, and can't be made, hashable, but they are comparable, you can at least get log-linear time (O(N*log M), which is a lot better than the O(N*M) time of the list solution, but not as good as the O(N+M) time of the set solution) by sorting and using bisect:
ys = sorted(y)
def bisect_contains(seq, item):
index = bisect.bisect(seq, item)
return index < len(seq) and seq[index] == item
[item for item in x if bisect_contains(ys, item)]
If your items are neither hashable nor comparable, then you're stuck with the quadratic solution.
* Note that you could also do this by using a pair of OrderedSet objects, for which you can find recipes and third-party modules. But I think this is simpler.
** The reason set lookups are constant time is that all it has to do is hash the value and see if there's an entry for that hash. If it can't hash the value, this won't work.

If the lists allow duplicate elements, you can use Counter from collections:
from collections import Counter
result = list((Counter(x)-Counter(y)).elements())
If you need to preserve the order of elements from x:
result = [ v for c in [Counter(y)] for v in x if not c[v] or c.subtract([v]) ]

The other solutions have one of a few problems:
They don't preserve order, or
They don't remove a precise count of elements, e.g. for x = [1, 2, 2, 2] and y = [2, 2] they convert y to a set, and either remove all matching elements (leaving [1] only) or remove one of each unique element (leaving [1, 2, 2]), when the proper behavior would be to remove 2 twice, leaving [1, 2], or
They do O(m * n) work, where an optimal solution can do O(m + n) work
Alain was on the right track with Counter to solve #2 and #3, but that solution will lose ordering. The solution that preserves order (removing the first n copies of each value for n repetitions in the list of values to remove) is:
from collections import Counter
x = [1,2,3,4,3,2,1]
y = [1,2,2]
remaining = Counter(y)
out = []
for val in x:
if remaining[val]:
remaining[val] -= 1
else:
out.append(val)
# out is now [3, 4, 3, 1], having removed the first 1 and both 2s.
Try it online!
To make it remove the last copies of each element, just change the for loop to for val in reversed(x): and add out.reverse() immediately after exiting the for loop.
Constructing the Counter is O(n) in terms of y's length, iterating x is O(n) in terms of x's length, and Counter membership testing and mutation are O(1), while list.append is amortized O(1) (a given append can be O(n), but for many appends, the overall big-O averages O(1) since fewer and fewer of them require a reallocation), so the overall work done is O(m + n).
You can also test for to determine if there were any elements in y that were not removed from x by testing:
remaining = +remaining # Removes all keys with zero counts from Counter
if remaining:
# remaining contained elements with non-zero counts

Looking up values in sets are faster than looking them up in lists:
[item for item in x if item not in set(y)]
I believe this will scale slightly better than:
[item for item in x if item not in y]
Both preserve the order of the lists.

We can use set methods as well to find the difference between two list
x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
y = [1, 3, 5, 7, 9]
list(set(x).difference(y))
[0, 2, 4, 6, 8]

Try this.
def subtract_lists(a, b):
""" Subtracts two lists. Throws ValueError if b contains items not in a """
# Terminate if b is empty, otherwise remove b[0] from a and recurse
return a if len(b) == 0 else [a[:i] + subtract_lists(a[i+1:], b[1:])
for i in [a.index(b[0])]][0]
>>> x = [1,2,3,4,5,6,7,8,9,0]
>>> y = [1,3,5,7,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0]
>>> x = [1,2,3,4,5,6,7,8,9,0,9]
>>> subtract_lists(x,y)
[2, 4, 6, 8, 0, 9] #9 is only deleted once
>>>

The answer provided by #aaronasterling looks good, however, it is not compatible with the default interface of list: x = MyList(1, 2, 3, 4) vs x = MyList([1, 2, 3, 4]). Thus, the below code can be used as a more python-list friendly:
class MyList(list):
def __init__(self, *args):
super(MyList, self).__init__(*args)
def __sub__(self, other):
return self.__class__([item for item in self if item not in other])
Example:
x = MyList([1, 2, 3, 4])
y = MyList([2, 5, 2])
z = x - y

from collections import Counter
y = Counter(y)
x = Counter(x)
print(list(x-y))

Let:
>>> xs = [1, 2, 3, 4, 3, 2, 1]
>>> ys = [1, 3, 3]
Keep each unique item only once   xs - ys == {2, 4}
Take the set difference:
>>> set(xs) - set(ys)
{2, 4}
Remove all occurrences   xs - ys == [2, 4, 2]
>>> [x for x in xs if x not in ys]
[2, 4, 2]
If ys is large, convert only1 ys into a set for better performance:
>>> ys_set = set(ys)
>>> [x for x in xs if x not in ys_set]
[2, 4, 2]
Only remove same number of occurrences   xs - ys == [2, 4, 2, 1]
from collections import Counter, defaultdict
def diff(xs, ys):
counter = Counter(ys)
for x in xs:
if counter[x] > 0:
counter[x] -= 1
continue
yield x
>>> list(diff(xs, ys))
[2, 4, 2, 1]
1 Converting xs to set and taking the set difference is unnecessary (and slower, as well as order-destroying) since we only need to iterate once over xs.

This example subtracts two lists:
# List of pairs of points
list = []
list.append([(602, 336), (624, 365)])
list.append([(635, 336), (654, 365)])
list.append([(642, 342), (648, 358)])
list.append([(644, 344), (646, 356)])
list.append([(653, 337), (671, 365)])
list.append([(728, 13), (739, 32)])
list.append([(756, 59), (767, 79)])
itens_to_remove = []
itens_to_remove.append([(642, 342), (648, 358)])
itens_to_remove.append([(644, 344), (646, 356)])
print("Initial List Size: ", len(list))
for a in itens_to_remove:
for b in list:
if a == b :
list.remove(b)
print("Final List Size: ", len(list))

list1 = ['a', 'c', 'a', 'b', 'k']
list2 = ['a', 'a', 'a', 'a', 'b', 'c', 'c', 'd', 'e', 'f']
for e in list1:
try:
list2.remove(e)
except ValueError:
print(f'{e} not in list')
list2
# ['a', 'a', 'c', 'd', 'e', 'f']
This will change list2. if you want to protect list2 just copy it and use the copy of list2 in this code.

def listsubtraction(parent,child):
answer=[]
for element in parent:
if element not in child:
answer.append(element)
return answer
I think this should work. I am a beginner so pardon me for any mistakes

Why do I get an IndexError: list index out of range

In my code I am getting an index error - IndexError: list index out of range. Could you please 1) explain why is this and then 2) make some corrections to my code? Thank you for your answer in advance
x = [1, 2, 3, 4, 5]
for i in range(len(x)):
if x[i] % 2 == 0:
del x[i]

When you use del you reduce the size of your array but the initial loop goes through the initial size of the array, hence the IndexError.
If you want to delete items I recommend using list comprehension:
x = [1, 2, 3, 4, 5]
x_filtered = [i for i in x if i%2]

Use a new list (comprehension) instead:
x = [1, 2, 3, 4, 5]
y = [item for item in x if not item % 2 == 0]
print(y)
# [1, 3, 5]
Or - considered "more pythonic":
y = [item for item in x if item % 2]

This is because you are removing objects inside of the loop, in other words making the list shorter.
Instead use this:
x = x[0::2]
To select every second value of the list
If you want all the even vaues, instead use a list generator:
x = [value for value in x in value%2 == 0]

You are deleting items from the very list you are iterating over. An alternative approach would be:
x = [1, 2, 3, 4, 5]
answer = [i for i in x if i % 2 != 0]
print(answer)
Outputs:
[1, 3, 5]

x = [1, 2, 3, 4, 5]
for i in range(len(x) -1, -1, -1):
if x[i] % 2 == 0:
x.pop(i)
"range function takes three arguments.
First is the start index which is [length of list – 1], that is, the index of last list element(since index of list elements starts from 0 till length – 1).
Second argument is the index at which to stop iteration.
Third argument is the step size.
Since we need to decrease index by 1 in every iteration, this should be -1." - Source
I highly recommend list comprehension however in certain circumstances there is no point and removing through iteration is fine. Up to you~

use while loop instead of for loop if you want to delete some item.
x = [1, 2, 3, 4, 5]
i = 0
while i<len(x):
if x[i]%2==0:
del x[i]
i+=1
print(x)

Function that compares 1st and last element, 2nd and 2nd last element, and so on

I want to write a function that compares the first element of this list with the last element of this list, the second element of this list with the second last element of this list, and so on. If the compared elements are the same, I want to add the element to a new list. Finally, I'd like to print this new list.
For example,
>>> f([1,5,7,7,8,1])
[1,7]
>>> f([3,1,4,1,5]
[1,4]
>>> f([2,3,5,7,1,3,5])
[3,7]
I was thinking to take the first (i) and last (k) element, compare them, then raise i but lower k, then repeat the process. When i and k 'overlap', stop, and print the list. I've tried to visualise my thoughts in the following code:
def f(x):
newlist=[]
k=len(x)-1
i=0
for j in x:
if x[i]==x[k]:
if i<k:
newlist.append(x[i])
i=i+1
k=k-1
print(newlist)
Please let me know if there are any errors in my code, or if there is a more suitable way to address the problem.
As I am new to Python, I am not very good with understanding complicated terminology/features of Python. As such, it would be encouraged if you took this into account in your answer.

You could use a conditional list comprehension with enumerate, comparing the element x at index i to the element at index -1-i (-1 being the last index of the list):
>>> lst = [1,5,7,7,8,1]
>>> [x for i, x in enumerate(lst[:(len(lst)+1)//2]) if lst[-1-i] == x]
[1, 7]
>>> lst = [3,1,4,1,5]
>>> [x for i, x in enumerate(lst[:(len(lst)+1)//2]) if lst[-1-i] == x]
[1, 4]
Or, as already suggested in other answers, use zip. However, it is enough to slice the first argument; the second one can just be the reversed list, as zip will stop once one of the argument lists is finished, making the code a bit shorter.
>>> [x for x, y in zip(lst[:(len(lst)+1)//2], reversed(lst)) if x == y]
In both approaches, (len(lst)+1)//2 is equivalent to int(math.ceil(len(lst)/2)).

maybe you want something like for even length of list:
>>> r=[l[i] for i in range(len(l)/2) if l[i]==l[-(i+1)]]
>>> r
[3]
>>> l=[1,5,7,7,8,1]
>>> r=[l[i] for i in range(len(l)/2) if l[i]==l[-(i+1)]]
>>> r
[1, 7]
And for odd length of list :
>>> l=[3,1,4,1,5]
>>> r=[l[i] for i in range(len(l)/2+1) if l[i]==l[-(i+1)]]
>>> r
[1, 4]
so you can create a function :
def myfunc(mylist):
if (len(mylist) % 2 == 0):
return [l[i] for i in range(len(l)/2) if l[i]==l[-(i+1)]]
else:
return [l[i] for i in range(len(l)/2+1) if l[i]==l[-(i+1)]]
and use it this way :
>>> l=[1,5,7,7,8,1]
>>> myfunc(l)
[1, 7]
>>> l=[3,1,4,1,5]
>>> myfunc(l)
[1, 4]

What you can do is zip over the first half and the second half reversed and use list comprehensions to build a list of the same ones:
[element_1 for element_1, element_2 in zip(l[:len(l)//2], reversed(l[(len(l)+1)//2:])) if element_1 == element_2]
What happens is that you take the first half and iterate over those as element_1, the second half reversed as element_2 and then only add them if they are the same:
l = [1, 2, 3, 3, 2, 4]
l[:len(l)//2] == [1, 2, 3]
reversed(l[(len(l)+1)//2:])) == [4, 2, 3]
1 != 4, 2 == 2, 3 == 3, result == [2, 3]
If you also want to include the middle element in the case of an odd list, we can just extend our lists to both include the middle element, which will always evaluate as the same:
[element_1 for element_1, element_2 in zip(l[:(len(l) + 1)//2], reversed(l[len(l)//2:])) if element_1 == element_2]
l = [3, 1, 4, 1, 5]
l[:len(l)//2] == [3, 1, 4]
reversed(l[(len(l)+1)//2:])) == [5, 1, 4]
3 != 5, 1 == 1, 4 == 4, result == [1, 4]

Here is my solution:
[el1 for (el1, el2) in zip(L[:len(L)//2+1], L[len(L)//2:][::-1]) if el1==el2]
There is a lot going on, so let me explain step by step:
L[:len(L)//2+1] is the first half of the list plus an extra element (which is useful for lists of odd lengths)
L[len(L)//2:][::-1] is the second half of the list, reversed ([::-1])
zip creates a list of pairs from two lists. it stops at the end of the shortest list. We use this in the case the length of the list is even, so the extra term in the first half is neglected
List comprehension essentially equivalent to a for loop, but useful to create a list "on the fly". It will return an element only if the if condition is true, otherwise it will pass.
You can easily modify the solution above if you are interested in the indexes (of the first half) where the match occurs:
[idx for idx, (el1, el2) in enumerate(zip(L[:len(L)//2+1], L[len(L)//2:][::-1])) if el1==el2]

You can use the following which leverages from zip_longest:
from itertools import zip_longest
def compare(lst):
size = len(lst) // 2
return [y for x, y in zip_longest(lst[:size], lst[-1:size-1:-1], fillvalue=None) if x == y or x is None]
print(compare([1, 5, 7, 7, 8, 1])) # [1, 7]
print(compare([3, 1, 4, 1, 5])) # [1, 4]
print(compare([2, 3, 5, 7, 1, 3, 5])) # [3, 7]
On zip_longest:
Normally, zip stops zipping when one of its iterators run out. zip_longest does not have that limitation and it simply keeps on zipping by adding dummy values.
Example:
list(zip([1, 2, 3], ['a'])) # [(1, 'a')]
list(zip_longest([1, 2, 3], ['a'], fillvalue='z')) # [(1, 'a'), (2, 'z'), (3, 'z')]

A cleaner/shorter way to solve this problem?

This exercise is taken from Google's Python Class:
D. Given a list of numbers, return a list where
all adjacent == elements have been reduced to a single element,
so [1, 2, 2, 3] returns [1, 2, 3]. You may create a new list or
modify the passed in list.
Here's my solution so far:
def remove_adjacent(nums):
if not nums:
return nums
list = [nums[0]]
for num in nums[1:]:
if num != list[-1]:
list.append(num)
return list
But this looks more like a C program than a Python script, and I have a feeling this can be done much more elegant.
EDIT
So [1, 2, 2, 3] should give [1, 2, 3] and [1, 2, 3, 3, 2] should give [1, 2, 3, 2]

There is function in itertools that works here:
import itertools
[key for key,seq in itertools.groupby([1,1,1,2,2,3,4,4])]
You can also write a generator:
def remove_adjacent(items):
# iterate the items
it = iter(items)
# get the first one
last = next(it)
# yield it in any case
yield last
for current in it:
# if the next item is different yield it
if current != last:
yield current
last = current
# else: its a duplicate, do nothing with it
print list(remove_adjacent([1,1,1,2,2,3,4,4]))

itertools to the rescue.
import itertools
def remove_adjacent(lst):
i = iter(lst)
yield next(i)
for x, y in itertools.izip(lst, i):
if x != y:
yield y
L = [1, 2, 2, 3]
print list(remove_adjacent(L))

Solution using list comprehensions, zipping then iterating through a twice. Inefficient, but short and sweet. It also has the problem of extending a[1:] with something.
a = [ 1,2,2,2,3,4,4,5,3,3 ]
b = [ i for i,j in zip(a,a[1:] + [None]) if not i == j ]

This works, but I'm not quite happy with it yet because of the +[None] bit to ensure that the last element is also returned...
>>> mylist=[1,2,2,3,3,3,3,4,5,5,5]
>>> [x for x, y in zip(mylist, mylist[1:]+[None]) if x != y]
[1, 2, 3, 4, 5]
The most Pythonic way is probably to go the path of least resistance and use itertools.groupby() as suggested by THC4K and be done with it.

>>> def collapse( data ):
... return list(sorted(set(data)))
...
>>> collapse([1,2,2,3])
[1, 2, 3]
Second attempt after the additional requirment was added:
>>> def remove_adjacent( data ):
... last = None
... for datum in data:
... if datum != last:
... last = datum
... yield datum
...
>>> list( remove_adjacent( [1,2,2,3,2] ) )
[1, 2, 3, 2]

You may want to look at itertools. Also, here's a tutorial on Python iterators and generators (pdf).

This is also somewhat functional; it could be written as a one-liner using lambdas but that would just make it more confusing. In Python 3 you'd need to import reduce from functools.
def remove_adjacent(nums):
def maybe_append(l, x):
return l + ([] if len(l) and l[-1] == x else [x])
return reduce(maybe_append, nums, [])

How to find duplicate elements in array using for loop in Python?

I have a list with duplicate elements:
list_a=[1,2,3,5,6,7,5,2]
tmp=[]
for i in list_a:
if tmp.__contains__(i):
print i
else:
tmp.append(i)
I have used the above code to find the duplicate elements in the list_a. I don't want to remove the elements from list.
But I want to use for loop here.
Normally C/C++ we use like this I guess:
for (int i=0;i<=list_a.length;i++)
for (int j=i+1;j<=list_a.length;j++)
if (list_a[i]==list_a[j])
print list_a[i]
how do we use like this in Python?
for i in list_a:
for j in list_a[1:]:
....
I tried the above code. But it gets solution wrong. I don't know how to increase the value for j.

Just for information, In python 2.7+, we can use Counter
import collections
x=[1, 2, 3, 5, 6, 7, 5, 2]
>>> x
[1, 2, 3, 5, 6, 7, 5, 2]
>>> y=collections.Counter(x)
>>> y
Counter({2: 2, 5: 2, 1: 1, 3: 1, 6: 1, 7: 1})
Unique List
>>> list(y)
[1, 2, 3, 5, 6, 7]
Items found more than 1 time
>>> [i for i in y if y[i]>1]
[2, 5]
Items found only one time
>>> [i for i in y if y[i]==1]
[1, 3, 6, 7]

Use the in operator instead of calling __contains__ directly.
What you have almost works (but is O(n**2)):
for i in xrange(len(list_a)):
for j in xrange(i + 1, len(list_a)):
if list_a[i] == list_a[j]:
print "duplicate:", list_a[i]
But it's far easier to use a set (roughly O(n) due to the hash table):
seen = set()
for n in list_a:
if n in seen:
print "duplicate:", n
else:
seen.add(n)
Or a dict, if you want to track locations of duplicates (also O(n)):
import collections
items = collections.defaultdict(list)
for i, item in enumerate(list_a):
items[item].append(i)
for item, locs in items.iteritems():
if len(locs) > 1:
print "duplicates of", item, "at", locs
Or even just detect a duplicate somewhere (also O(n)):
if len(set(list_a)) != len(list_a):
print "duplicate"

You could always use a list comprehension:
dups = [x for x in list_a if list_a.count(x) > 1]

Before Python 2.3, use dict() :
>>> lst = [1, 2, 3, 5, 6, 7, 5, 2]
>>> stats = {}
>>> for x in lst : # count occurrences of each letter:
... stats[x] = stats.get(x, 0) + 1
>>> print stats
{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1} # filter letters appearing more than once:
>>> duplicates = [dup for (dup, i) in stats.items() if i > 1]
>>> print duplicates
So a function :
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
stats = {}
for x in iterable :
stats[x] = stats.get(x, 0) + 1
return (dup for (dup, i) in stats.items() if i > 1)
With Python 2.3 comes set(), and it's even a built-in after than :
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
try: # try using built-in set
found = set()
except NameError: # fallback on the sets module
from sets import Set
found = Set()
for x in iterable:
if x in found : # set is a collection that can't contain duplicate
yield x
found.add(x) # duplicate won't be added anyway
With Python 2.7 and above, you have the collections module providing the very same function than the dict one, and we can make it shorter (and faster, it's probably C under the hood) than solution 1 :
import collections
def getDuplicates(iterable):
"""
Take an iterable and return a generator yielding its duplicate items.
Items must be hashable.
e.g :
>>> sorted(list(getDuplicates([1, 2, 3, 5, 6, 7, 5, 2])))
[2, 5]
"""
return (dup for (dup, i) in collections.counter(iterable).items() if i > 1)
I'd stick with solution 2.

You can use this function to find duplicates:
def get_duplicates(arr):
dup_arr = arr[:]
for i in set(arr):
dup_arr.remove(i)
return list(set(dup_arr))
Examples
print get_duplicates([1,2,3,5,6,7,5,2])
[2, 5]
print get_duplicates([1,2,1,3,4,5,4,4,6,7,8,2])
[1, 2, 4]

If you're looking for one-to-one mapping between your nested loops and Python, this is what you want:
n = len(list_a)
for i in range(n):
for j in range(i+1, n):
if list_a[i] == list_a[j]:
print list_a[i]
The code above is not "Pythonic". I would do it something like this:
seen = set()
for i in list_a:
if i in seen:
print i
else:
seen.add(i)
Also, don't use __contains__, rather, use in (as above).

The following requires the elements of your list to be hashable (not just implementing __eq__ ).
I find it more pythonic to use a defaultdict (and you have the number of repetitions for free):
import collections
l = [1, 2, 4, 1, 3, 3]
d = collections.defaultdict(int)
for x in l:
d[x] += 1
print [k for k, v in d.iteritems() if v > 1]
# prints [1, 3]

Using only itertools, and works fine on Python 2.5
from itertools import groupby
list_a = sorted([1, 2, 3, 5, 6, 7, 5, 2])
result = dict([(r, len(list(grp))) for r, grp in groupby(list_a)])
Result:
{1: 1, 2: 2, 3: 1, 5: 2, 6: 1, 7: 1}

It looks like you have a list (list_a) potentially including duplicates, which you would rather keep as it is, and build a de-duplicated list tmp based on list_a. In Python 2.7, you can accomplish this with one line:
tmp = list(set(list_a))
Comparing the lengths of tmp and list_a at this point should clarify if there were indeed duplicate items in list_a. This may help simplify things if you want to go into the loop for additional processing.

You could just "translate" it line by line.
c++
for (int i=0;i<=list_a.length;i++)
for (int j=i+1;j<=list_a.length;j++)
if (list_a[i]==list_a[j])
print list_a[i]
Python
for i in range(0, len(list_a)):
for j in range(i + 1, len(list_a))
if list_a[i] == list_a[j]:
print list_a[i]
c++ for loop:
for(int x = start; x < end; ++x)
Python equivalent:
for x in range(start, end):

Just quick and dirty,
list_a=[1,2,3,5,6,7,5,2]
holding_list=[]
for x in list_a:
if x in holding_list:
pass
else:
holding_list.append(x)
print holding_list
Output [1, 2, 3, 5, 6, 7]

Using numpy:
import numpy as np
count,value = np.histogram(list_a,bins=np.hstack((np.unique(list_a),np.inf)))
print 'duplicate value(s) in list_a: ' + ', '.join([str(v) for v in value[count>1]])

In case of Python3 and if you two lists
def removedup(List1,List2):
List1_copy = List1[:]
for i in List1_copy:
if i in List2:
List1.remove(i)
List1 = [4,5,6,7]
List2 = [6,7,8,9]
removedup(List1,List2)
print (List1)

Granted, I haven't done tests, but I guess it's going to be hard to beat pandas in speed:
pd.DataFrame(list_a, columns=["x"]).groupby('x').size().to_dict()

You can use:
b=['E', 'P', 'P', 'E', 'O', 'E']
c={}
for i in b:
value=0
for j in b:
if(i == j):
value+=1
c[i]=value
print(c)
Output:
{'E': 3, 'P': 2, 'O': 1}

Find duplicates in the list using loops, conditional logic, logical operators, and list methods
some_list = ['a','b','c','d','e','b','n','n','c','c','h',]
duplicates = []
for values in some_list:
if some_list.count(values) > 1:
if values not in duplicates:
duplicates.append(values)
print("Duplicate Values are : ",duplicates)

Finding the number of repeating elements in a list:
myList = [3, 2, 2, 5, 3, 8, 3, 4, 'a', 'a', 'f', 4, 4, 1, 8, 'D']
listCleaned = set(myList)
for s in listCleaned:
count = 0
for i in myList:
if s == i :
count += 1
print(f'total {s} => {count}')

Try like this:
list_a=[1,2,3,5,6,7,5,2]
unique_values = []
duplicates = []
for i in list_a:
if i not in unique_values:
unique_values.append(i)
else:
found = False
for x in duplicates:
if x.get("key") == i:
found = True
if found:
x["occurrence"] += 1
else:
duplicates.append({
"key": i,
"occurrence": 1
})

some_string= list(input("Enter any string:\n"))
count={}
dup_count={}
for i in some_string:
if i not in count:
count[i]=1
else:
count[i]+=1
dup_count[i]=count[i]
print("Duplicates of given string are below:\n",dup_count)

A little bit more Pythonic implementation (not the most, of course), but in the spirit of your C code could be:
for i, elem in enumerate(seq):
if elem in seq[i+1:]:
print elem
Edit: yes, it prints the elements more than once if there're more than 2 repetitions, but that's what the op's C pseudo code does too.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Removing duplicate element from a list and the element itself - python

This should be done with a Counter object. It's trivial. from collections import Counter x = [k for k, v in Counter([1, 2, 5, 3, 4, 1, 5]).iteritems() if v == 1] print x Output: [2, 3, 4]

Maybe this way: [_ for _ in x if x.count(_) == 1] EDIT: This is not the best way in term of time complexity as you can see in the comment above, sorry my mistake.

Something more verbose and O(n): x = [1, 2, 2, 3, 4] def counts_fold(acc, x): acc[x] = acc[x]+1 if x in acc else 1 return acc counts = reduce(counts_fold, x, {}) y = [i for i in x if counts[i] == 1] print y

How about duplicates = set(x) x = [elem for elem in x if elem not in duplicates] This has the advantage of being O(n) instead of O(n^2). Edit. Indeed my bad, I must have been half asleep. Mahmoud's answer above is the correct one.

Related

Identifying certain elements which are not in one list in Python [duplicate]

Why do I get an IndexError: list index out of range

Function that compares 1st and last element, 2nd and 2nd last element, and so on

A cleaner/shorter way to solve this problem?

How to find duplicate elements in array using for loop in Python?

Categories

Resources