Is it possible to get which values are duplicates in a list using python?
I have a list of items:
mylist = [20, 30, 25, 20]
I know the best way of removing the duplicates is set(mylist), but is it possible to know what values are being duplicated? As you can see, in this list the duplicates are the first and last values. [0, 3].
Is it possible to get this result or something similar in python? I'm trying to avoid making a ridiculously big if elif conditional statement.
These answers are O(n), so a little more code than using mylist.count() but much more efficient as mylist gets longer
If you just want to know the duplicates, use collections.Counter
from collections import Counter
mylist = [20, 30, 25, 20]
[k for k,v in Counter(mylist).items() if v>1]
If you need to know the indices,
from collections import defaultdict
D = defaultdict(list)
for i,item in enumerate(mylist):
D[item].append(i)
D = {k:v for k,v in D.items() if len(v)>1}
Here's a list comprehension that does what you want. As #Codemonkey says, the list starts at index 0, so the indices of the duplicates are 0 and 3.
>>> [i for i, x in enumerate(mylist) if mylist.count(x) > 1]
[0, 3]
You can use list compression and set to reduce the complexity.
my_list = [3, 5, 2, 1, 4, 4, 1]
opt = [item for item in set(my_list) if my_list.count(item) > 1]
The following list comprehension will yield the duplicate values:
[x for x in mylist if mylist.count(x) >= 2]
simplest way without any intermediate list using list.index():
z = ['a', 'b', 'a', 'c', 'b', 'a', ]
[z[i] for i in range(len(z)) if i == z.index(z[i])]
>>>['a', 'b', 'c']
and you can also list the duplicates itself (may contain duplicates again as in the example):
[z[i] for i in range(len(z)) if not i == z.index(z[i])]
>>>['a', 'b', 'a']
or their index:
[i for i in range(len(z)) if not i == z.index(z[i])]
>>>[2, 4, 5]
or the duplicates as a list of 2-tuples of their index (referenced to their first occurrence only), what is the answer to the original question!!!:
[(i,z.index(z[i])) for i in range(len(z)) if not i == z.index(z[i])]
>>>[(2, 0), (4, 1), (5, 0)]
or this together with the item itself:
[(i,z.index(z[i]),z[i]) for i in range(len(z)) if not i == z.index(z[i])]
>>>[(2, 0, 'a'), (4, 1, 'b'), (5, 0, 'a')]
or any other combination of elements and indices....
I tried below code to find duplicate values from list
1) create a set of duplicate list
2) Iterated through set by looking in duplicate list.
glist=[1, 2, 3, "one", 5, 6, 1, "one"]
x=set(glist)
dup=[]
for c in x:
if(glist.count(c)>1):
dup.append(c)
print(dup)
OUTPUT
[1, 'one']
Now get the all index for duplicate element
glist=[1, 2, 3, "one", 5, 6, 1, "one"]
x=set(glist)
dup=[]
for c in x:
if(glist.count(c)>1):
indices = [i for i, x in enumerate(glist) if x == c]
dup.append((c,indices))
print(dup)
OUTPUT
[(1, [0, 6]), ('one', [3, 7])]
Hope this helps someone
That's the simplest way I can think for finding duplicates in a list:
my_list = [3, 5, 2, 1, 4, 4, 1]
my_list.sort()
for i in range(0,len(my_list)-1):
if my_list[i] == my_list[i+1]:
print str(my_list[i]) + ' is a duplicate'
The following code will fetch you desired results with duplicate items and their index values.
for i in set(mylist):
if mylist.count(i) > 1:
print(i, mylist.index(i))
You should sort the list:
mylist.sort()
After this, iterate through it like this:
doubles = []
for i, elem in enumerate(mylist):
if i != 0:
if elem == old:
doubles.append(elem)
old = None
continue
old = elem
You can print duplicate and Unqiue using below logic using list.
def dup(x):
duplicate = []
unique = []
for i in x:
if i in unique:
duplicate.append(i)
else:
unique.append(i)
print("Duplicate values: ",duplicate)
print("Unique Values: ",unique)
list1 = [1, 2, 1, 3, 2, 5]
dup(list1)
mylist = [20, 30, 25, 20]
kl = {i: mylist.count(i) for i in mylist if mylist.count(i) > 1 }
print(kl)
It looks like you want the indices of the duplicates. Here is some short code that will find those in O(n) time, without using any packages:
dups = {}
[dups.setdefault(v, []).append(i) for i, v in enumerate(mylist)]
dups = {k: v for k, v in dups.items() if len(v) > 1}
# dups now has keys for all the duplicate values
# and a list of matching indices for each
# The second line produces an unused list.
# It could be replaced with this:
for i, v in enumerate(mylist):
dups.setdefault(v, []).append(i)
m = len(mylist)
for index,value in enumerate(mylist):
for i in xrange(1,m):
if(index != i):
if (L[i] == L[index]):
print "Location %d and location %d has same list-entry: %r" % (index,i,value)
This has some redundancy that can be improved however.
def checkduplicate(lists):
a = []
for i in lists:
if i in a:
pass
else:
a.append(i)
return i
print(checkduplicate([1,9,78,989,2,2,3,6,8]))
Related
Suppose I have a list a = [-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1] in python what i want is if there is any built in function in python in which we pass a list and it will return which element are present at what what index ranges for example
>>> index_range(a)
{-1 :'0-2,9-11', 1:'3-5,12-14', 2:'6-8'}
I have tried to use Counter function from collection.Counter library but it only outputs the count of the element.
If there is not any built in function can you please guide me how can i achieve this in my own function not the whole code just a guideline.
You can create your custom function using itertools.groupby and collections.defaultdict to get the range of numbers in the form of list as:
from itertools import groupby
from collections import defaultdict
def index_range(my_list):
my_dict = defaultdict(list)
for i, j in groupby(enumerate(my_list), key=lambda x: x[1]):
index_range, numlist = list(zip(*j))
my_dict[numlist[0]].append((index_range[0], index_range[-1]))
return my_dict
Sample Run:
>>> index_range([-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1])
{1: [(3, 5), (12, 14)], 2: [(6, 8)], -1: [(0, 2), (9, 11)]}
In order to get the values as string in your dict, you may either modify the above function, or use the return value of the function in dictionary comprehension as:
>>> result_dict = index_range([-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1])
>>> {k: ','.join('{}:{}'.format(*i) for i in v)for k, v in result_dict.items()}
{1: '3:5,12:14', 2: '6:8', -1: '0:2,9:11'}
You can use a dict that uses list items as keys and their indexes as values:
>>> lst = [-1,-1,-1,1,1,1,2,2,2,-1,-1,-1,1,1,1]
>>> indexes = {}
>>> for index, item in enumerate(lst):
... indexes.setdefault(value, []).append(index)
>>> indexes
{1: [3, 4, 5, 12, 13, 14], 2: [6, 7, 8], -1: [0, 1, 2, 9, 10, 11]}
You could then merge the index lists into ranges if that's what you need. I can help you with that too if necessary.
list1 = [1,2,5,6,7,8,10,41,69,78,83,100,105,171]
index_list = [0,4,7,9,10]
how do I pop an item from list1 using indexes from index_list?
output_list = [2,5,6,8,10,69,100,105,17]
How about the opposite: Retain those elements that are not in the list:
>>> list1 = [1,2,5,6,7,8,10,41,69,78,83,100,105,171]
>>> index_list = [0,4,7,9,10]
>>> index_set = set(index_list) # optional but faster
>>> [x for i, x in enumerate(list1) if i not in index_set]
[2, 5, 6, 8, 10, 69, 100, 105, 171]
Note: This does not modify the existing list but creates a new one.
list1 = [1,2,5,6,7,8,10,41,69,78,83,100,105,171]
index_list = [0,4,7,9,10]
print([ t[1] for t in enumerate(list1) if t[0] not in index_list])
RESULT
[2, 5, 6, 8, 10, 69, 100, 105, 171]
enumerate will create a structure like below.
[(0, 1), (1, 2),(2, 5),(3, 6),(4, 7),(5, 8),...(13, 171)]
Where t = (0,1) (index,item)
t[0] = index
t[1] = item
You could try this -
for index in sorted(index_list, reverse=True):
list1.pop(index)
print (list1)
pop() has an optional argument index. It will remove the element in index
Use list.remove(item)
for n in reversed(index_list):
list1.remove(list1[n])
or list.pop(index)
for n in reversed(index_list):
list1.pop(n)
Both methods are described here https://docs.python.org/2/tutorial/datastructures.html
Use reversed() on your index_list (assuming that the indices are always ordered like in the case you have shown), so you remove items from the end of the list and it should work fine.
X = [[1,2], [5,1], [1,2], [2,-1] , [5,1]]
I want to count "frequency" of repetitive elements for example [1,2]
Unless speed is really an issue, the simplest approach is to map the sub arrays to tuples and use a Counter dict:
X = [[1,2], [5,1], [1,2], [2,-1] , [5,1]]
from collections import Counter
cn = Counter(map(tuple, X))
print(cn)
print(list(filter(lambda x:x[1] > 1,cn.items())))
Counter({(1, 2): 2, (5, 1): 2, (2, -1): 1})
((1, 2), 2), ((5, 1), 2)]
If you consider [1, 2]equal to [2, 1] then you could use a frozenset Counter(map(frozenset, X)
Take a look at numpy.unique: http://docs.scipy.org/doc/numpy-1.10.1/reference/generated/numpy.unique.html
You can use the return_counts argument for getting the count of each item:
values, counts = numpy.unique(X, return_counts = True)
repeated = values[counts > 1]
Assuming I understand what you want:
Try to count each item in your list into a dictionary dict then select from dict items that its count > 1
The following code might help you:
freq = dict()
for item in x:
if tuple(item) not in x:
freq[tuple(item)] = 1
else:
freq[tuple(item)] += 1
print {k:v for(k,v) in freq.items() if v > 1}
That code will give you the output:
{(1, 2): 2}
Given a list x e.g.
[4,6,7,21,1,7,3]
I need to extract those values that are less than or equal to 4. This is easily done, but I also need to take some note of where in the list those values occurred. If all values were unique I know I could probably use list.index() in some way. But there will be duplicated values. How best to achieve this?
how about simply
[(i, val) for i, val in enumerate([[4,6,7,21,1,7,3]) if val <= 4]
or depending on your use-case, perhaps a dictionary would be more suitable? Either from index to value:
{i:val for i, val in enumerate([4,6,7,21,1,7,3]) if val <= 4}
or from value to index:
from collections import defaultdict
indexes = defaultdict(list)
for i, val in enumerate([4,6,7,21,1,7,3]):
if val <= 4:
indexes[val].append(i)
you can make another list which will store tuples of the elements less than equal to 4 as first element and their index as second element, like this:
my_list = [4, 6, 7, 21, 1, 7, 3]
req_list = []
for i in range(len(my_list)):
e = my_list[i]
if e <= 4:
req_list.append((e, i))
here req_list will have pair-tuples with the first element as the element less than equal to 4 and the second element the index of that element.
e.g.
if
my_list = [4, 6, 7, 21, 1, 7, 3]
then
req_list = [(4, 0), (1, 4), (3, 6)]
suppose the list
[7,7,7,7,3,1,5,5,1,4]
I would like to remove duplicates and get them counted while preserving the order of the list. To preserve the order of the list removing duplicates i use the function
def unique(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
that is giving to me the output
[7,3,1,5,1,4]
but the desired output i want would be (in the final list could exists) is:
[7,3,3,1,5,2,4]
7 is written because it's the first item in the list, then the following is checked if it's the different from the previous. If the answer is yes count the occurrences of the same item until a new one is found. Then repeat the procedure. Anyone more skilled than me that could give me a hint in order to get the desired output listed above? Thank you in advance
Perhaps something like this?
>>> from itertools import groupby
>>> seen = set()
>>> out = []
>>> for k, g in groupby(lst):
if k not in seen:
length = sum(1 for _ in g)
if length > 1:
out.extend([k, length])
else:
out.append(k)
seen.add(k)
...
>>> out
[7, 4, 3, 1, 5, 2, 4]
Update:
As per your comment I guess you wanted something like this:
>>> out = []
>>> for k, g in groupby(lst):
length = sum(1 for _ in g)
if length > 1:
out.extend([k, length])
else:
out.append(k)
...
>>> out
[7, 4, 3, 1, 5, 2, 1, 4]
Try this
import collections as c
lst = [7,7,7,7,3,1,5,5,1,4]
result = c.OrderedDict()
for el in lst:
if el not in result.keys():
result[el] = 1
else:
result[el] = result[el] + 1
print result
prints out: OrderedDict([(7, 4), (3, 1), (1, 2), (5, 2), (4, 1)])
It gives a dictionary though. For a list, use:
lstresult = []
for el in result:
# print k, v
lstresult.append(el)
if result[el] > 1:
lstresult.append(result[el] - 1)
It doesn't match your desired output but your desired output also seems like kind of a mangling of what is trying to be represented