Related
How can I find the index of the duplicate element in the array?
mylist = [1,2,3,4,5,6,7,8,9,10,1,2]
How can I find the index number of 1 and 2 repeated array elements here?
So, not the first duplicate element, I want to find and delete the trailing duplicate element from the array. 1 2 values at the end.
Thanks for your responses.
Single pass remove duplicates:
mylist = [1,2,3,4,5,6,7,8,9,10,1,2]
def remove_duplicates(l):
seen = {}
res = []
for item in l:
if item not in seen:
seen[item] = 1
res.append(item)
return res
print(remove_duplicates(mylist))
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Which also preserves order:
mylist = [1,10,3,4,5,6,7,8,9,2,1,2]
print(remove_duplicates(mylist))
[1, 10, 3, 4, 5, 6, 7, 8, 9, 2]
The simplest way to do this (unless you want to log where the duplicates are) is to convert the list to a set.
A set can contain only one instance of each value.
You can do this very easily as shown below.
Note: A set is respresented with curly braces { like a dictionary, but does not have key,value pairs.
mylist = [1,2,3,4,5,6,7,8,9,10,1,2]
myset = set(mylist)
// myset = {1,2,3,4,5,6,7,8,9,10}
EDIT: If the order is important, this method will not work as a set does not store the order.
You can find duplicates in a cycle, memorizing previously seen values in a set.
mylist = [1,2,3,4,5,6,7,8,9,10,1,2]
myset = set()
indices_of_duplicates = []
for ind, val in enumerate(mylist):
if val in myset:
indices_of_duplicates.append(ind)
else:
myset.add(val)
Then you can delete duplicate elements with
for ind in reversed(indices_of_duplicates):
del mylist[ind]
Note that we are deleting elements starting from the end of the list because otherwise we would shift elements and we would have to update the indices we have found.
eh, I did this:
a=[1,2,3,4,5,5,6,7,7,8]
b=list(set(a))
tr=0
al=len(a)
bl=len(b)
iz=[]
if al != bl:
for ai in range (0,al):
ai0=a[ai]
bi0=b[ai]
if ai0 != bi0:
iz.append(ai)
b.insert(ai,ai0)
then the list iz is all the i values where they don't match, so you just plug those back in.
just wanted to share!
If you have to remove duplicates you can use Set()
mylist = [x for x in set([1,2,3,4,5,6,7,8,9,10,1,2])]
I'm trying to manipulate a given list in an unusual way (at least for me).
Basically, I have the list a (also image 1), it has the first index as principal. Now, I want to iterate through the other indexes and if a certain value match with one of those in the first index, I want to insert the sublist of this index inside the first one.
I don't know if I was clear enough, but the goal should be the list b (also image 2). I think a recursive function should be used here, but I don't know how. Do you guys think it's possible?
Original list:
a = [[1,2,3],[2,5],[6,3],[10,5]]
Expected Output:
b = [[1,2,[2,5,[10,5]],3,[6,3]]]
You could use a dictionary to record where the first occurrence of each number is found, recording the list in which it was found, and at which index. If then a list is found that has a value that was already encountered, the recorded list can be mutated having the matching list inserted. If there was no match (which is the case for the very first list [1,2,3]), then this list is just appended to the result.
Because insertion into a list will impact other insertion points, I suggest to first collect the insertion actions, and then apply them in reversed order:
Here is the code for that:
def solve(a):
dct = {}
result = []
insertions = []
for lst in a:
found = None
for i, val in enumerate(lst):
if val in dct:
found = val
else:
dct[val] = [lst, i]
if found is None:
result.append(lst)
else:
insertions.append((*dct[found], lst))
for target, i, lst in reversed(insertions):
target.insert(i + 1, lst)
return result
# Example run:
a = [[1,2,3],[2,5],[6,3],[10,5]]
print(solve(a))
Output:
[[1, 2, [2, 5, [10, 5]], 3, [6, 3]]]
Problem to solve: Define a Python function remdup(l) that takes a non-empty list of integers l
and removes all duplicates in l, keeping only the last occurrence of each number. For instance:
if we pass this argument then remdup([3,1,3,5]) it should give us a result [1,3,5]
def remdup(l):
for last in reversed(l):
pos=l.index(last)
for search in reversed(l[pos]):
if search==last:
l.remove(search)
print(l)
remdup([3,5,7,5,3,7,10])
# intended output [5, 3, 7, 10]
On line 4 for loop I want the reverse function to check for each number excluding index[last] but if I use the way I did in the above code it takes the value at pos, not the index number. How can I solve this
You need to reverse the entire slice, not merely one element:
for search in reversed(l[:pos]):
Note that you will likely run into a problem for modifying a list while iterating. See here
It took me a few minutes to figure out the clunky logic. Instead, you need the rest of the list:
for search in reversed(l[pos+1:]):
Output:
[5, 3, 7, 10]
Your original algorithm could be improved. The nested loop leads to some unnecessary complexity.
Alternatively, you can do this:
def remdup(l):
seen = set()
for i in reversed(l):
if i in seen:
l.remove(i)
else:
seen.add(i)
print(l)
I use the 'seen' set to keep track of the numbers that have already appeared.
However, this would be more efficient:
def remdup(l):
seen = set()
for i in range(len(l)-1, -1, -1):
if l[i] in seen:
del l[i]
else:
seen.add(l[i])
print(l)
In the second algorithm, we are iterating over the list in reverse order using a range, and then we delete any item that already exists in 'seen'. I'm not sure what the implementation of reversed() and remove() is, so I can't say what the exact impact on time/space complexity is. However, it is clear to see exactly what is happening in the second algorithm, so I would say that it is a safer option.
This is a fairly inefficient way of accomplishing this:
def remdup(l):
i = 0
while i < len(l):
v = l[i]
scan = i + 1
while scan < len(l):
if l[scan] == v:
l.remove(v)
scan -= 1
i -= 1
scan += 1
i += 1
l = [3,5,7,5,3,7,10]
remdup(l)
print(l)
It essentially walks through the list (indexed by i). For each element, it scans forward in the list for a match, and for each match it finds, it removes the original element. Since removing an element shifts the indices, it adjusts both its indices accordingly before continuing.
It takes advantage of the built-in the list.remove: "Remove the first item from the list whose value is equal to x."
Here is another solution, iterating backward and popping the index of a previously encountered item:
def remdup(l):
visited= []
for i in range(len(l)-1, -1, -1):
if l[i] in visited:
l.pop(i)
else:
visited.append(l[i])
print(l)
remdup([3,5,7,5,3,7,10])
#[5, 3, 7, 10]
Using dictionary:
def remdup(ar):
d = {}
for i, v in enumerate(ar):
d[v] = i
return [pair[0] for pair in sorted(d.items(), key=lambda x: x[1])]
if __name__ == "__main__":
test_case = [3, 1, 3, 5]
output = remdup(test_case)
expected_output = [1, 3, 5]
assert output == expected_output, f"Error in {test_case}"
test_case = [3, 5, 7, 5, 3, 7, 10]
output = remdup(test_case)
expected_output = [5, 3, 7, 10]
assert output == expected_output, f"Error in {test_case}"
Explanation
Keep the last index of each occurrence of the numbers in a dictionary. So, we store like: dict[number] = last_occurrence
Sort the dictionary by values and use list comprehension to make a new list from the keys of the dictionary.
Along with other right answers, here's one more.
from iteration_utilities import unique_everseen,duplicates
import numpy as np
list1=[3,5,7,5,3,7,10]
dup=np.sort(list((duplicates(list1))))
list2=list1.copy()
for j,i in enumerate(list2):
try:
if dup[j]==i:
list1.remove(dup[j])
except:
break
print(list1)
How about this one-liner: (convert to a function is easy enough for an exercise)
# - one-liner Version
lst = [3,5,7,5,3,7,10]
>>>list(dict.fromkeys(reversed(lst)))[::-1]
# [5, 3, 7, 10]
if you don't want a new list, you can do this instead:
lst[:] = list(dict.fromkeys(reversed(lst)))[::-1]
I have a python list where elements can repeat.
>>> a = [1,2,2,3,3,4,5,6]
I want to get the first n unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:
[1,2,3,4,5]
I have come up with a solution using generators:
def iterate(itr, upper=5):
count = 0
for index, element in enumerate(itr):
if index==0:
count += 1
yield element
elif element not in itr[:index] and count<upper:
count += 1
yield element
In use:
>>> i = iterate(a, 5)
>>> [e for e in i]
[1,2,3,4,5]
I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?
I would use a set to remember what was seen and return from the generator when you have seen enough:
a = [1, 2, 2, 3, 3, 4, 5, 6]
def get_unique_N(iterable, N):
"""Yields (in order) the first N unique elements of iterable.
Might yield less if data too short."""
seen = set()
for e in iterable:
if e in seen:
continue
seen.add(e)
yield e
if len(seen) == N:
return
k = get_unique_N([1, 2, 2, 3, 3, 4, 5, 6], 4)
print(list(k))
Output:
[1, 2, 3, 4]
According to PEP-479 you should return from generators, not raise StopIteration - thanks to #khelwood & #iBug for that piece of comment - one never learns out.
With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration
Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.
Consider [1, 2, 3, 4, 4, 4, 4, 5] vs [1] * 1000 + [2] * 1000 + [3] * 1000 + [4] * 1000 + [5] * 1000 + [6]:
For 6 uniques (in longer list):
you would have lookups of O(1)+O(2)+...+O(5001)
mine would have 5001*O(1) lookup + memory for set( {1, 2, 3, 4, 5, 6})
You can adapt the popular itertools unique_everseen recipe:
def unique_everseen_limit(iterable, limit=5):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
if len(seen) == limit:
break
a = [1,2,2,3,3,4,5,6]
res = list(unique_everseen_limit(a)) # [1, 2, 3, 4, 5]
Alternatively, as suggested by #Chris_Rands, you can use itertools.islice to extract a fixed number of values from a non-limited generator:
from itertools import islice
def unique_everseen(iterable):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
Note the unique_everseen recipe is available in 3rd party libraries via more_itertools.unique_everseen or toolz.unique, so you could use:
from itertools import islice
from more_itertools import unique_everseen
from toolz import unique
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
res = list(islice(unique(a), 5)) # [1, 2, 3, 4, 5]
If your objects are hashable (ints are hashable) you can write utility function using fromkeys method of collections.OrderedDict class (or starting from Python3.7 a plain dict, since they became officially ordered) like
from collections import OrderedDict
def nub(iterable):
"""Returns unique elements preserving order."""
return OrderedDict.fromkeys(iterable).keys()
and then implementation of iterate can be simplified to
from itertools import islice
def iterate(itr, upper=5):
return islice(nub(itr), upper)
or if you want always a list as an output
def iterate(itr, upper=5):
return list(nub(itr))[:upper]
Improvements
As #Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:
def nub(iterable):
seen = set()
add_seen = seen.add
for element in iterable:
if element in seen:
continue
yield element
add_seen(element)
Here is a Pythonic approach using itertools.takewhile():
In [95]: from itertools import takewhile
In [96]: seen = set()
In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))
Out[97]: {1, 2, 3, 4}
You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.
N = 3
a = [1, 2, 2, 3, 3, 3, 4]
d = {x: True for x in a}
list(d.keys())[:N]
There are really amazing answers for this question, which are fast, compact and brilliant! The reason I am putting here this code is that I believe there are plenty of cases when you don't care about 1 microsecond time loose nor you want additional libraries in your code for one-time solving a simple task.
a = [1,2,2,3,3,4,5,6]
res = []
for x in a:
if x not in res: # yes, not optimal, but doesnt need additional dict
res.append(x)
if len(res) == 5:
break
print(res)
Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:
from itertools import groupby, islice
def first_unique(data, upper):
return islice((key for (key, _) in groupby(data)), 0, upper)
a = [1, 2, 2, 3, 3, 4, 5, 6]
print(list(first_unique(a, 5)))
Updated to use islice instead of enumerate per #juanpa.arrivillaga. You don't even need a set to keep track of duplicates.
Using set with sorted+ key
sorted(set(a), key=list(a).index)[:5]
Out[136]: [1, 2, 3, 4, 5]
Given
import itertools as it
a = [1, 2, 2, 3, 3, 4, 5, 6]
Code
A simple list comprehension (similar to #cdlane's answer).
[k for k, _ in it.groupby(a)][:5]
# [1, 2, 3, 4, 5]
Alternatively, in Python 3.6+:
list(dict.fromkeys(a))[:5]
# [1, 2, 3, 4, 5]
Profiling Analysis
Solutions
Which solution is the fastest? There are two clear favorite answers (and 3 solutions) that captured most of the votes.
The solution by Patrick Artner - denoted as PA.
The first solution by jpp - denoted as jpp1
The second solution by jpp - denoted as jpp2
This is because these claim to run in O(N) while others here run in O(N^2), or do not guarantee the order of the returned list.
Experiment setup
For this experiment 3 variables were considered.
N elements. The number of first N elements the function is searching for.
List length. The longer the list the further the algorithm has to look to find the last element.
Repeat limit. How many times an element can repeat before the next element occurs in the list. This is uniformly distributed between 1 and the repeat limit.
The assumptions for data generation were as follows. How strict these are depend on the algorithm used, but is more a note on how the data was generated than a limitation on the algorithms themselves.
The elements never occur again after its repeated sequence first appears in the list.
The elements are numeric and increasing.
The elements are of type int.
So in a list of [1,1,1,2,2,3,4 ....] 1,2,3 would never appear again. The next element after 4 would be 5, but there could be a random number of 4s up to the repeat limit before we see 5.
A new dataset was created for each combination of variables and and re-generated 20 times. The python timeit function was used to profile the algorithms 50 times on each dataset. The mean time of the 20x50=1000 runs (for each combination) were reported here. Since the algorithms are generators, their outputs were converted to a list to get the execution time.
Results
As is expected the more elements searched for, the longer it takes. This graph shows that the execution time is indeed O(N) as claimed by the authors (the straight line proves this).
Fig 1. Varying the first N elements searched for.
All three solutions do not consume additional computation time beyond that which is required. The below image shows what happens when the list is limited in size, and not N elements. Lists of length 10k, with elements repeating a maximum of 100 times (and thus on average repeating 50 times) would on average run out of unique elements by 200 (10000/50). If any of these graphs showed an increase in computation time beyond 200 this would be a cause for concern.
Fig 2. The effect of first N elements chosen > number of unique elements.
The figure below again shows that processing time increases (at a rate of O(N)) the more data the algorithm has to sift through. The rate of increase is the same as when first N elements were varied. This is because stepping through the list is the common execution block in both, and the execution block that ultimately decides how fast the algorithm is.
Fig 3. Varying the repeat limit.
Conclusion
The 2nd solution posted by jpp is the fastest solution of the 3 in all cases. The solution is only slightly faster than the solution posted by Patrick Artner, and is almost twice as fast as his first solution.
Why not use something like this?
>>> a = [1, 2, 2, 3, 3, 4, 5, 6]
>>> list(set(a))[:5]
[1, 2, 3, 4, 5]
Example list:
a = [1, 2, 2, 3, 3, 4, 5, 6]
Function returns all or count of unique items needed from list
1st argument - list to work with, 2nd argument (optional) - count of unique items (by default - None - it means that all unique elements will be returned)
def unique_elements(lst, number_of_elements=None):
return list(dict.fromkeys(lst))[:number_of_elements]
Here is example how it works. List name is "a", and we need to get 2 unique elements:
print(unique_elements(a, 2))
Output:
a = [1,2,2,3,3,4,5,6]
from collections import defaultdict
def function(lis,n):
dic = defaultdict(int)
sol=set()
for i in lis:
try:
if dic[i]:
pass
else:
sol.add(i)
dic[i]=1
if len(sol)>=n:
break
except KeyError:
pass
return list(sol)
print(function(a,3))
output
[1, 2, 3]
I am trying to find the least common element in an array of integers and remove it, and return the list in the same order.
This is what I have done, but when the list is [1, 2, 3, 4, 5], my function should return [], but returns [2, 4] instead.
def check(data):
for i in data:
if data.count(i) <= 1:
data.remove(i)
return data
data = [1, 2, 3, 4, 5]
print check(data)
Deleting items from a list you are iterating over causes you to skip items (ie the next item following each one you delete).
Instead, make a new list containing only the values you want to keep.
from collections import Counter
def check(data):
ctr = Counter(data)
least = min(ctr.values())
return [d for d in data if ctr[d] > least]
You shouldn't modify (especially delete) elements from a list while you are iterating over it.
What happened is:
Initially the iterator is at the 1st element, i.e. i = 1
Since d.count(1) is 1, so you delete 1 from the list.
The list is now [2,3,4,5], but the iterator advances to the 2nd element which is now the 3.
Since d.count(3) is 1 you delete it making the list [2,4,5]
The iterator advances to the 3rd element which is now 5.
Again you delete the 5 making the list [2,4].
Your algorithm should:
Get a count of all elements
Find the smallest count.
Find the elements with the smallest count.
Remove the elements found in step 3 from the list.
You shouldn't check data.count(i) <= 1. What happens in this case: [1, 1, 2, 2, 3, 3, 3]? 1 and 2 are the least common elements but you will never delete them. Likewise it is a bad idea to mutate a list in a for loop.
One thing you can do is use the Counter class.
Take an appropriate slice of the tail of the most_common() method (they entries get less frequent as you go down the list, so this is why you take the tail as opposed to the head).
Then you can repeatedly search the list for these occurrences and remove them until their are no occurrences left.
one another try:
def check(data):
ctr = Counter(data)
keys = ctr.keys()
vals = ctr.values()
least = []
m = min(vals)
for i in range(0,len(vals)):
if vals[i] == m:
least.append(keys[i])
print least
data = [1, 2, 3, 4, 5,1]
result = check(data)