This question already has answers here:
Removing elements that have consecutive duplicates
(9 answers)
Closed 3 years ago.
I have a list like this,
l=[1,1,1,1,2,2,2,3,3,3,3,3,3,4,4,5,6,6,6,5,5,5,7,7,8,8,8,8,8,9,9,9,10,10]
Now I want to select only one values of repeating value continuously, so the output should look like,
l=[1,2,3,4,5,6,5,7,8,9,10]
I could do this using a for loop checking the next values with the previous and append to a list, But the execution time is huge in that case, Looking for shortcuts to do it most efficiently.
You can use itertools.groupby to group consecutive values and keep only the grouping key:
from itertools import groupby
[k for k,_ in groupby(l)]
# [1, 2, 3, 4, 5, 6, 5, 7, 8, 9, 10]
Use the set data structure to remove duplicates and convert it back to a list:
l = [1,1,1,1,2,2,2,3,3,3,3,3,3,4,4,5,6,6,6,5,5,5,7,7,8,8,8,8,8,9,9,9,10,10]
# remove duplicates
num = set(l)
# convert the set of unique values back to a list
num = list(num)
print(num)
Output:
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
You can use itertools.groupby or something like this (I haven't used zip or slices on purpose and tried to keep it as simple as possible)
def group_by(array):
if not array:
return array
prev = array[0]
result = [prev]
for el in array:
if el != prev:
result.append(el)
prev = el
return result
P.S. Yes, with zip or enumerate or islice it can be more elegant, I know :-)
I have a python list where elements can repeat.
>>> a = [1,2,2,3,3,4,5,6]
I want to get the first n unique elements from the list.
So, in this case, if i want the first 5 unique elements, they would be:
[1,2,3,4,5]
I have come up with a solution using generators:
def iterate(itr, upper=5):
count = 0
for index, element in enumerate(itr):
if index==0:
count += 1
yield element
elif element not in itr[:index] and count<upper:
count += 1
yield element
In use:
>>> i = iterate(a, 5)
>>> [e for e in i]
[1,2,3,4,5]
I have doubts on this being the most optimal solution. Is there an alternative strategy that i can implement to write it in a more pythonic and efficient
way?
I would use a set to remember what was seen and return from the generator when you have seen enough:
a = [1, 2, 2, 3, 3, 4, 5, 6]
def get_unique_N(iterable, N):
"""Yields (in order) the first N unique elements of iterable.
Might yield less if data too short."""
seen = set()
for e in iterable:
if e in seen:
continue
seen.add(e)
yield e
if len(seen) == N:
return
k = get_unique_N([1, 2, 2, 3, 3, 4, 5, 6], 4)
print(list(k))
Output:
[1, 2, 3, 4]
According to PEP-479 you should return from generators, not raise StopIteration - thanks to #khelwood & #iBug for that piece of comment - one never learns out.
With 3.6 you get a deprecated warning, with 3.7 it gives RuntimeErrors: Transition Plan if still using raise StopIteration
Your solution using elif element not in itr[:index] and count<upper: uses O(k) lookups - with k being the length of the slice - using a set reduces this to O(1) lookups but uses more memory because the set has to be kept as well. It is a speed vs. memory tradeoff - what is better is application/data dependend.
Consider [1, 2, 3, 4, 4, 4, 4, 5] vs [1] * 1000 + [2] * 1000 + [3] * 1000 + [4] * 1000 + [5] * 1000 + [6]:
For 6 uniques (in longer list):
you would have lookups of O(1)+O(2)+...+O(5001)
mine would have 5001*O(1) lookup + memory for set( {1, 2, 3, 4, 5, 6})
You can adapt the popular itertools unique_everseen recipe:
def unique_everseen_limit(iterable, limit=5):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
if len(seen) == limit:
break
a = [1,2,2,3,3,4,5,6]
res = list(unique_everseen_limit(a)) # [1, 2, 3, 4, 5]
Alternatively, as suggested by #Chris_Rands, you can use itertools.islice to extract a fixed number of values from a non-limited generator:
from itertools import islice
def unique_everseen(iterable):
seen = set()
seen_add = seen.add
for element in iterable:
if element not in seen:
seen_add(element)
yield element
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
Note the unique_everseen recipe is available in 3rd party libraries via more_itertools.unique_everseen or toolz.unique, so you could use:
from itertools import islice
from more_itertools import unique_everseen
from toolz import unique
res = list(islice(unique_everseen(a), 5)) # [1, 2, 3, 4, 5]
res = list(islice(unique(a), 5)) # [1, 2, 3, 4, 5]
If your objects are hashable (ints are hashable) you can write utility function using fromkeys method of collections.OrderedDict class (or starting from Python3.7 a plain dict, since they became officially ordered) like
from collections import OrderedDict
def nub(iterable):
"""Returns unique elements preserving order."""
return OrderedDict.fromkeys(iterable).keys()
and then implementation of iterate can be simplified to
from itertools import islice
def iterate(itr, upper=5):
return islice(nub(itr), upper)
or if you want always a list as an output
def iterate(itr, upper=5):
return list(nub(itr))[:upper]
Improvements
As #Chris_Rands mentioned this solution walks through entire collection and we can improve this by writing nub utility in a form of generator like others already did:
def nub(iterable):
seen = set()
add_seen = seen.add
for element in iterable:
if element in seen:
continue
yield element
add_seen(element)
Here is a Pythonic approach using itertools.takewhile():
In [95]: from itertools import takewhile
In [96]: seen = set()
In [97]: set(takewhile(lambda x: seen.add(x) or len(seen) <= 4, a))
Out[97]: {1, 2, 3, 4}
You can use OrderedDict or, since Python 3.7, an ordinary dict, since they are implemented to preserve the insertion order. Note that this won't work with sets.
N = 3
a = [1, 2, 2, 3, 3, 3, 4]
d = {x: True for x in a}
list(d.keys())[:N]
There are really amazing answers for this question, which are fast, compact and brilliant! The reason I am putting here this code is that I believe there are plenty of cases when you don't care about 1 microsecond time loose nor you want additional libraries in your code for one-time solving a simple task.
a = [1,2,2,3,3,4,5,6]
res = []
for x in a:
if x not in res: # yes, not optimal, but doesnt need additional dict
res.append(x)
if len(res) == 5:
break
print(res)
Assuming the elements are ordered as shown, this is an opportunity to have fun with the groupby function in itertools:
from itertools import groupby, islice
def first_unique(data, upper):
return islice((key for (key, _) in groupby(data)), 0, upper)
a = [1, 2, 2, 3, 3, 4, 5, 6]
print(list(first_unique(a, 5)))
Updated to use islice instead of enumerate per #juanpa.arrivillaga. You don't even need a set to keep track of duplicates.
Using set with sorted+ key
sorted(set(a), key=list(a).index)[:5]
Out[136]: [1, 2, 3, 4, 5]
Given
import itertools as it
a = [1, 2, 2, 3, 3, 4, 5, 6]
Code
A simple list comprehension (similar to #cdlane's answer).
[k for k, _ in it.groupby(a)][:5]
# [1, 2, 3, 4, 5]
Alternatively, in Python 3.6+:
list(dict.fromkeys(a))[:5]
# [1, 2, 3, 4, 5]
Profiling Analysis
Solutions
Which solution is the fastest? There are two clear favorite answers (and 3 solutions) that captured most of the votes.
The solution by Patrick Artner - denoted as PA.
The first solution by jpp - denoted as jpp1
The second solution by jpp - denoted as jpp2
This is because these claim to run in O(N) while others here run in O(N^2), or do not guarantee the order of the returned list.
Experiment setup
For this experiment 3 variables were considered.
N elements. The number of first N elements the function is searching for.
List length. The longer the list the further the algorithm has to look to find the last element.
Repeat limit. How many times an element can repeat before the next element occurs in the list. This is uniformly distributed between 1 and the repeat limit.
The assumptions for data generation were as follows. How strict these are depend on the algorithm used, but is more a note on how the data was generated than a limitation on the algorithms themselves.
The elements never occur again after its repeated sequence first appears in the list.
The elements are numeric and increasing.
The elements are of type int.
So in a list of [1,1,1,2,2,3,4 ....] 1,2,3 would never appear again. The next element after 4 would be 5, but there could be a random number of 4s up to the repeat limit before we see 5.
A new dataset was created for each combination of variables and and re-generated 20 times. The python timeit function was used to profile the algorithms 50 times on each dataset. The mean time of the 20x50=1000 runs (for each combination) were reported here. Since the algorithms are generators, their outputs were converted to a list to get the execution time.
Results
As is expected the more elements searched for, the longer it takes. This graph shows that the execution time is indeed O(N) as claimed by the authors (the straight line proves this).
Fig 1. Varying the first N elements searched for.
All three solutions do not consume additional computation time beyond that which is required. The below image shows what happens when the list is limited in size, and not N elements. Lists of length 10k, with elements repeating a maximum of 100 times (and thus on average repeating 50 times) would on average run out of unique elements by 200 (10000/50). If any of these graphs showed an increase in computation time beyond 200 this would be a cause for concern.
Fig 2. The effect of first N elements chosen > number of unique elements.
The figure below again shows that processing time increases (at a rate of O(N)) the more data the algorithm has to sift through. The rate of increase is the same as when first N elements were varied. This is because stepping through the list is the common execution block in both, and the execution block that ultimately decides how fast the algorithm is.
Fig 3. Varying the repeat limit.
Conclusion
The 2nd solution posted by jpp is the fastest solution of the 3 in all cases. The solution is only slightly faster than the solution posted by Patrick Artner, and is almost twice as fast as his first solution.
Why not use something like this?
>>> a = [1, 2, 2, 3, 3, 4, 5, 6]
>>> list(set(a))[:5]
[1, 2, 3, 4, 5]
Example list:
a = [1, 2, 2, 3, 3, 4, 5, 6]
Function returns all or count of unique items needed from list
1st argument - list to work with, 2nd argument (optional) - count of unique items (by default - None - it means that all unique elements will be returned)
def unique_elements(lst, number_of_elements=None):
return list(dict.fromkeys(lst))[:number_of_elements]
Here is example how it works. List name is "a", and we need to get 2 unique elements:
print(unique_elements(a, 2))
Output:
a = [1,2,2,3,3,4,5,6]
from collections import defaultdict
def function(lis,n):
dic = defaultdict(int)
sol=set()
for i in lis:
try:
if dic[i]:
pass
else:
sol.add(i)
dic[i]=1
if len(sol)>=n:
break
except KeyError:
pass
return list(sol)
print(function(a,3))
output
[1, 2, 3]
This question already has answers here:
Wrapping around a list as a slice operation
(4 answers)
Closed 6 years ago.
I want to swap the consecutive elements of a list such that the first element will go to the last. eg [4, 7, 3, 4, 3] should print as [7,3,4,3,4]
This is the code I've written but it doesn't work right. How can I modify it to get it working?
ls = [4, 7, 3, 4, 3]
i=0
b=1
while i+1<len(ls):
ls[i]=a
ls[i]=ls[i+1]
ls[i+1]=a
i+=1
print ls
I do not want to just switch the first element with the last one. I want to modify this code further to create a bubble sort algorithm, I am just confused at how to get done what I just explained.
Updated : Thanks for the answer that I should change "ls[i]=a" with "a=ls[i]", but can someone explain to me how this differs in logic?
You don't need a loop to move the first element to the last. Just pop the first element off the list, then append it.
ls.append(ls.pop(0))
Python allows to work with lists by selecting one or several elements at the time and to concatenate the content of the list
ls = [4, 7, 3, 4, 3]
new_list_with_first_element_at_the_end = ls[1:] + [ls[0]]
If you really feel you must, make a swap function:
def swap(l, i, j):
tmp = l[j]
l[j] = i
l[i] = tmp
Then you can:
for i in range(len(l)-1):
swap(l, i, i+1)
Bit of a generic noob question. I am heavily dealing with long lists of integer/float values and wasting a lot of time.
my_list = [1,2,3,4,5,6,7,8,9,10.....] etc.
Say I want to pass a portion of that list to a function. It could be the first 3 elements, then the following 3 etc....it could also be in groups of 4,5,6....it might even be required that I take different a different amount of elements each time.
def myfunc(x,y,z):
do something
return something
What is the most efficient way to iterate by a specified number of values, as efficiency is always appreciated and these simple iterations are the places where I can gain something.
len_ml = len(my_list)
for i in range(0, len_ml, 3):
chunk = my_list[i:min(len_ml, i+3)]
This is a way. I am not sure that it is the best, though.
With a list you can get just the items you want with list[start:end].
So to skip the first list[1:] or last list[:-1] or first 3 and last 3 list[3:-3]
if you don't know how may items are in the list to start you can do a len(list) to get number. So if you had X items and wanted 3 groups :
numberofgroups = len(list) / 3
To do for loop over only certain ones:
start_index=1
end_index=-1
for item in my_list[start_index:end_index]
print item
>>>my_list = [1,2,3,4,5,6,7,8,9,10]
>>>group_len = 3 #you can change the length as per your requirement i.e 2,3,4,5,6...
>>>for i in range(0,len(my_list)):
if i*group_len < len(my_list):
my_list[i*group_len:(i+1)*group_len]
else:
break;
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]
Result for group_len = 5
[1, 2, 3, 4, 5]
[6, 7, 8, 9, 10]