Duplicating Items within a list - python

I am fairly new to python and am trying to figure out how to duplicate items within a list. I have tried several different things and searched for the answer extensively but always come up with an answer of how to remove duplicate items, and I feel like I am missing something that should be fairly apparent.
I want a list of items to duplicate such as if the list was [1, 4, 7, 10] to be [1, 1, 4, 4, 7, 7, 10, 10]
I know that
list = range(5)
for i in range(len(list)):
list.insert(i+i, i)
print list
will return [0, 0, 1, 1, 2, 2, 3, 3, 4, 4] but this does not work if the items are not in order.
To provide more context I am working with audio as a list, attempting to make the audio slower.
I am working with:
def slower():
left = Audio.getLeft()
right = Audio.getRight()
for i in range(len(left)):
left.insert(????)
right.insert(????)
Where "left" returns a list of items that are the "sounds" in the left headphone and "right" is a list of items that are sounds in the right headphone. Any help would be appreciated. Thanks.

Here is a simple way:
def slower(audio):
return [audio[i//2] for i in range(0,len(audio)*2)]

Something like this works:
>>> list = [1, 32, -45, 12]
>>> for i in range(len(list)):
... list.insert(2*i+1, list[2*i])
...
>>> list
[1, 1, 32, 32, -45, -45, 12, 12]
A few notes:
Don't use list as a variable name.
It's probably cleaner to flatten the list zipped with itself.
e.g.
>>> zip(list,list)
[(1, 1), (-1, -1), (32, 32), (42, 42)]
>>> [x for y in zip(list, list) for x in y]
[1, 1, -1, -1, 32, 32, 42, 42]
Or, you can do this whole thing lazily with itertools:
from itertools import izip, chain
for item in chain.from_iterable(izip(list, list)):
print item
I actually like this method best of all. When I look at the code, it is the one that I immediately know what it is doing (although others may have different opinions on that).
I suppose while I'm at it, I'll just point out that we can do the same thing as above with a generator function:
def multiply_elements(iterable, ntimes=2):
for item in iterable:
for _ in xrange(ntimes):
yield item
And lets face it -- Generators are just a lot of fun. :-)

listOld = [1,4,7,10]
listNew = []
for element in listOld:
listNew.extend([element,element])

This might not be the fastest way but it is pretty compact
a = range(5)
list(reduce(operator.add, zip(a,a)))
a then contains
[0, 0, 1, 1, 2, 2, 3, 3, 4, 4]

a = [0,1,2,3]
list(reduce(lambda x,y: x + y, zip(a,a))) #=> [0,0,1,1,2,2,3,3]

Related

Insert a value to a sorted list via list comprehension Python

I have an assignment to add a value to a sorted list using list comprehension. I'm not allowed to import modules, only list comprehension, preferably a one liner. I'm not allowed to create functions and use them aswell.
I'm completely in the dark with this problem. Hopefully someone can help :)
Edit: I don't need to mutate the current list. In fact, I'm trying my solution right now with .pop, I need to create a new list with the element properly added, but I still can't figure out much.
try:
sorted_a.insert(next(i for i,lhs,rhs in enumerate(zip(sorted_a,sorted_a[1:])) if lhs <= value <= rhs),value)
except StopIteration:
sorted_a.append(value)
I guess ....
with your new problem statement
[x for x in sorted_a if x <= value] + [value,] + [y for y in sorted_a if y >= value]
you could certainly improve the big-O complexity
For bisecting the list, you may use bisect.bisect (for other readers referencing the answer in future) as:
>>> from bisect import bisect
>>> my_list = [2, 4, 6, 9, 10, 15, 18, 20]
>>> num = 12
>>> index = bisect(my_list, num)
>>> my_list[:index]+[num] + my_list[index:]
[2, 4, 6, 9, 10, 12, 15, 18, 20]
But since you can not import libraries, you may use sum and zip with list comprehension expression as:
>>> my_list = [2, 4, 6, 9, 10, 15, 18, 20]
>>> num = 12
>>> sum([[i, num] if i<num<j else [i] for i, j in zip(my_list,my_list[1:])], [])
[2, 4, 6, 9, 10, 12, 15, 18]

Cycle through list starting at a certain element

Say I have a list:
l = [1, 2, 3, 4]
And I want to cycle through it. Normally, it would do something like this,
1, 2, 3, 4, 1, 2, 3, 4, 1, 2...
I want to be able to start at a certain point in the cycle, not necessarily an index, but perhaps matching an element. Say I wanted to start at whatever element in the list ==4, then the output would be,
4, 1, 2, 3, 4, 1, 2, 3, 4, 1...
How can I accomplish this?
Look at itertools module. It provides all the necessary functionality.
from itertools import cycle, islice, dropwhile
L = [1, 2, 3, 4]
cycled = cycle(L) # cycle thorugh the list 'L'
skipped = dropwhile(lambda x: x != 4, cycled) # drop the values until x==4
sliced = islice(skipped, None, 10) # take the first 10 values
result = list(sliced) # create a list from iterator
print(result)
Output:
[4, 1, 2, 3, 4, 1, 2, 3, 4, 1]
Use the arithmetic mod operator. Suppose you're starting from position k, then k should be updated like this:
k = (k + 1) % len(l)
If you want to start from a certain element, not index, you can always look it up like k = l.index(x) where x is the desired item.
I'm not such a big fan of importing modules when you can do things by your own in a couple of lines. Here's my solution without imports:
def cycle(my_list, start_at=None):
start_at = 0 if start_at is None else my_list.index(start_at)
while True:
yield my_list[start_at]
start_at = (start_at + 1) % len(my_list)
This will return an (infinite) iterator looping your list. To get the next element in the cycle you must use the next statement:
>>> it1 = cycle([101,102,103,104])
>>> next(it1), next(it1), next(it1), next(it1), next(it1)
(101, 102, 103, 104, 101) # and so on ...
>>> it1 = cycle([101,102,103,104], start_at=103)
>>> next(it1), next(it1), next(it1), next(it1), next(it1)
(103, 104, 101, 102, 103) # and so on ...
import itertools as it
l = [1, 2, 3, 4]
list(it.islice(it.dropwhile(lambda x: x != 4, it.cycle(l)), 10))
# returns: [4, 1, 2, 3, 4, 1, 2, 3, 4, 1]
so the iterator you want is:
it.dropwhile(lambda x: x != 4, it.cycle(l))
Hm, http://docs.python.org/library/itertools.html#itertools.cycle doesn't have such a start element.
Maybe you just start the cycle anyway and drop the first elements that you don't like.
Another weird option is that cycling through lists can be accomplished backwards. For instance:
# Run this once
myList = ['foo', 'bar', 'baz', 'boom']
myItem = 'baz'
# Run this repeatedly to cycle through the list
if myItem in myList:
myItem = myList[myList.index(myItem)-1]
print myItem
Can use something like this:
def my_cycle(data, start=None):
k = 0 if not start else start
while True:
yield data[k]
k = (k + 1) % len(data)
Then run:
for val in my_cycle([0,1,2,3], 2):
print(val)
Essentially the same as one of the previous answers. My bad.

How to write a generator that returns ALL-BUT-LAST items in the iterable in Python?

I asked some similar questions [1, 2] yesterday and got great answers, but I am not yet technically skilled enough to write a generator of such sophistication myself.
How could I write a generator that would raise StopIteration if it's the last item, instead of yielding it?
I am thinking I should somehow ask two values at a time, and see if the 2nd value is StopIteration. If it is, then instead of yielding the first value, I should raise this StopIteration. But somehow I should also remember the 2nd value that I asked if it wasn't StopIteration.
I don't know how to write it myself. Please help.
For example, if the iterable is [1, 2, 3], then the generator should return 1 and 2.
Thanks, Boda Cydo.
[1] How do I modify a generator in Python?
[2] How to determine if the value is ONE-BUT-LAST in a Python generator?
This should do the trick:
def allbutlast(iterable):
it = iter(iterable)
current = it.next()
for i in it:
yield current
current = i
>>> list(allbutlast([1,2,3]))
[1, 2]
This will iterate through the entire list, and return the previous item so the last item is never returned.
Note that calling the above on both [] and [1] will return an empty list.
First off, is a generator really needed? This sounds like the perfect job for Python’s slices syntax:
result = my_range[ : -1]
I.e.: take a range form the first item to the one before the last.
the itertools module shows a pairwise() method in its recipes. adapting from this recipe, you can get your generator:
from itertools import *
def n_apart(iterable, n):
a,b = tee(iterable)
for count in range(n):
next(b)
return zip(a,b)
def all_but_n_last(iterable, n):
return (value for value,dummy in n_apart(iterable, n))
the n_apart() function return pairs of values which are n elements apart in the input iterable, ignoring all pairs . all_but_b_last() returns the first value of all pairs, which incidentally ignores the n last elements of the list.
>>> data = range(10)
>>> list(data)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> list(n_apart(data,3))
[(0, 3), (1, 4), (2, 5), (3, 6), (4, 7), (5, 8), (6, 9)]
>>> list(all_but_n_last(data,3))
[0, 1, 2, 3, 4, 5, 6]
>>>
>>> list(all_but_n_last(data,1))
[0, 1, 2, 3, 4, 5, 6, 7, 8]
The more_itertools project has a tool that emulates itertools.islice with support for negative indices:
import more_itertools as mit
list(mit.islice_extended([1, 2, 3], None, -1))
# [1, 2]
gen = (x for x in iterable[:-1])

Iteration over list slices

I want an algorithm to iterate over list slices. Slices size is set outside the function and can differ.
In my mind it is something like:
for list_of_x_items in fatherList:
foo(list_of_x_items)
Is there a way to properly define list_of_x_items or some other way of doing this using python 2.5?
edit1: Clarification Both "partitioning" and "sliding window" terms sound applicable to my task, but I am no expert. So I will explain the problem a bit deeper and add to the question:
The fatherList is a multilevel numpy.array I am getting from a file. Function has to find averages of series (user provides the length of series) For averaging I am using the mean() function. Now for question expansion:
edit2: How to modify the function you have provided to store the extra items and use them when the next fatherList is fed to the function?
for example if the list is lenght 10 and size of a chunk is 3, then the 10th member of the list is stored and appended to the beginning of the next list.
Related:
What is the most “pythonic” way to iterate over a list in chunks?
If you want to divide a list into slices you can use this trick:
list_of_slices = zip(*(iter(the_list),) * slice_size)
For example
>>> zip(*(iter(range(10)),) * 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8)]
If the number of items is not dividable by the slice size and you want to pad the list with None you can do this:
>>> map(None, *(iter(range(10)),) * 3)
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, None, None)]
It is a dirty little trick
OK, I'll explain how it works. It'll be tricky to explain but I'll try my best.
First a little background:
In Python you can multiply a list by a number like this:
[1, 2, 3] * 3 -> [1, 2, 3, 1, 2, 3, 1, 2, 3]
([1, 2, 3],) * 3 -> ([1, 2, 3], [1, 2, 3], [1, 2, 3])
And an iterator object can be consumed once like this:
>>> l=iter([1, 2, 3])
>>> l.next()
1
>>> l.next()
2
>>> l.next()
3
The zip function returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables. For example:
zip([1, 2, 3], [20, 30, 40]) -> [(1, 20), (2, 30), (3, 40)]
zip(*[(1, 20), (2, 30), (3, 40)]) -> [[1, 2, 3], [20, 30, 40]]
The * in front of zip used to unpack arguments. You can find more details here.
So
zip(*[(1, 20), (2, 30), (3, 40)])
is actually equivalent to
zip((1, 20), (2, 30), (3, 40))
but works with a variable number of arguments
Now back to the trick:
list_of_slices = zip(*(iter(the_list),) * slice_size)
iter(the_list) -> convert the list into an iterator
(iter(the_list),) * N -> will generate an N reference to the_list iterator.
zip(*(iter(the_list),) * N) -> will feed those list of iterators into zip. Which in turn will group them into N sized tuples. But since all N items are in fact references to the same iterator iter(the_list) the result will be repeated calls to next() on the original iterator
I hope that explains it. I advice you to go with an easier to understand solution. I was only tempted to mention this trick because I like it.
If you want to be able to consume any iterable you can use these functions:
from itertools import chain, islice
def ichunked(seq, chunksize):
"""Yields items from an iterator in iterable chunks."""
it = iter(seq)
while True:
yield chain([it.next()], islice(it, chunksize-1))
def chunked(seq, chunksize):
"""Yields items from an iterator in list chunks."""
for chunk in ichunked(seq, chunksize):
yield list(chunk)
Use a generator:
big_list = [1,2,3,4,5,6,7,8,9]
slice_length = 3
def sliceIterator(lst, sliceLen):
for i in range(len(lst) - sliceLen + 1):
yield lst[i:i + sliceLen]
for slice in sliceIterator(big_list, slice_length):
foo(slice)
sliceIterator implements a "sliding window" of width sliceLen over the squence lst, i.e. it produces overlapping slices: [1,2,3], [2,3,4], [3,4,5], ... Not sure if that is the OP's intention, though.
Do you mean something like:
def callonslices(size, fatherList, foo):
for i in xrange(0, len(fatherList), size):
foo(fatherList[i:i+size])
If this is roughly the functionality you want you might, if you desire, dress it up a bit in a generator:
def sliceup(size, fatherList):
for i in xrange(0, len(fatherList), size):
yield fatherList[i:i+size]
and then:
def callonslices(size, fatherList, foo):
for sli in sliceup(size, fatherList):
foo(sli)
Answer to the last part of the question:
question update: How to modify the
function you have provided to store
the extra items and use them when the
next fatherList is fed to the
function?
If you need to store state then you can use an object for that.
class Chunker(object):
"""Split `iterable` on evenly sized chunks.
Leftovers are remembered and yielded at the next call.
"""
def __init__(self, chunksize):
assert chunksize > 0
self.chunksize = chunksize
self.chunk = []
def __call__(self, iterable):
"""Yield items from `iterable` `self.chunksize` at the time."""
assert len(self.chunk) < self.chunksize
for item in iterable:
self.chunk.append(item)
if len(self.chunk) == self.chunksize:
# yield collected full chunk
yield self.chunk
self.chunk = []
Example:
chunker = Chunker(3)
for s in "abcd", "efgh":
for chunk in chunker(s):
print ''.join(chunk)
if chunker.chunk: # is there anything left?
print ''.join(chunker.chunk)
Output:
abc
def
gh
I am not sure, but it seems you want to do what is called a moving average. numpy provides facilities for this (the convolve function).
>>> x = numpy.array(range(20))
>>> x
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19])
>>> n = 2 # moving average window
>>> numpy.convolve(numpy.ones(n)/n, x)[n-1:-n+1]
array([ 0.5, 1.5, 2.5, 3.5, 4.5, 5.5, 6.5, 7.5, 8.5,
9.5, 10.5, 11.5, 12.5, 13.5, 14.5, 15.5, 16.5, 17.5, 18.5])
The nice thing is that it accomodates different weighting schemes nicely (just change numpy.ones(n) / n to something else).
You can find a complete material here:
http://www.scipy.org/Cookbook/SignalSmooth
Expanding on the answer of #Ants Aasma: In Python 3.7 the handling of the StopIteration exception changed (according to PEP-479). A compatible version would be:
from itertools import chain, islice
def ichunked(seq, chunksize):
it = iter(seq)
while True:
try:
yield chain([next(it)], islice(it, chunksize - 1))
except StopIteration:
return
Your question could use some more detail, but how about:
def iterate_over_slices(the_list, slice_size):
for start in range(0, len(the_list)-slice_size):
slice = the_list[start:start+slice_size]
foo(slice)
For a near-one liner (after itertools import) in the vein of Nadia's answer dealing with non-chunk divisible sizes without padding:
>>> import itertools as itt
>>> chunksize = 5
>>> myseq = range(18)
>>> cnt = itt.count()
>>> print [ tuple(grp) for k,grp in itt.groupby(myseq, key=lambda x: cnt.next()//chunksize%2)]
[(0, 1, 2, 3, 4), (5, 6, 7, 8, 9), (10, 11, 12, 13, 14), (15, 16, 17)]
If you want, you can get rid of the itertools.count() requirement using enumerate(), with a rather uglier:
[ [e[1] for e in grp] for k,grp in itt.groupby(enumerate(myseq), key=lambda x: x[0]//chunksize%2) ]
(In this example the enumerate() would be superfluous, but not all sequences are neat ranges like this, obviously)
Nowhere near as neat as some other answers, but useful in a pinch, especially if already importing itertools.
A function that slices a list or an iterator into chunks of a given size. Also handles the case correctly if the last chunk is smaller:
def slice_iterator(data, slice_len):
it = iter(data)
while True:
items = []
for index in range(slice_len):
try:
item = next(it)
except StopIteration:
if items == []:
return # we are done
else:
break # exits the "for" loop
items.append(item)
yield items
Usage example:
for slice in slice_iterator([1,2,3,4,5,6,7,8,9,10],3):
print(slice)
Result:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
[10]

In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique *while preserving order*? [duplicate]

This question already has answers here:
How do I remove duplicates from a list, while preserving order?
(31 answers)
Closed 8 years ago.
For example:
>>> x = [1, 1, 2, 'a', 'a', 3]
>>> unique(x)
[1, 2, 'a', 3]
Assume list elements are hashable.
Clarification: The result should keep the first duplicate in the list. For example, [1, 2, 3, 2, 3, 1] becomes [1, 2, 3].
def unique(items):
found = set()
keep = []
for item in items:
if item not in found:
found.add(item)
keep.append(item)
return keep
print unique([1, 1, 2, 'a', 'a', 3])
Using:
lst = [8, 8, 9, 9, 7, 15, 15, 2, 20, 13, 2, 24, 6, 11, 7, 12, 4, 10, 18, 13, 23, 11, 3, 11, 12, 10, 4, 5, 4, 22, 6, 3, 19, 14, 21, 11, 1, 5, 14, 8, 0, 1, 16, 5, 10, 13, 17, 1, 16, 17, 12, 6, 10, 0, 3, 9, 9, 3, 7, 7, 6, 6, 7, 5, 14, 18, 12, 19, 2, 8, 9, 0, 8, 4, 5]
And using the timeit module:
$ python -m timeit -s 'import uniquetest' 'uniquetest.etchasketch(uniquetest.lst)'
And so on for the various other functions (which I named after their posters), I have the following results (on my first generation Intel MacBook Pro):
Allen: 14.6 µs per loop [1]
Terhorst: 26.6 µs per loop
Tarle: 44.7 µs per loop
ctcherry: 44.8 µs per loop
Etchasketch 1 (short): 64.6 µs per loop
Schinckel: 65.0 µs per loop
Etchasketch 2: 71.6 µs per loop
Little: 89.4 µs per loop
Tyler: 179.0 µs per loop
[1] Note that Allen modifies the list in place – I believe this has skewed the time, in that the timeit module runs the code 100000 times and 99999 of them are with the dupe-less list.
Summary: Straight-forward implementation with sets wins over confusing one-liners :-)
Update: on Python3.7+:
>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']
old answer:
Here is the fastest solution so far (for the following input):
def del_dups(seq):
seen = {}
pos = 0
for item in seq:
if item not in seen:
seen[item] = True
seq[pos] = item
pos += 1
del seq[pos:]
lst = [8, 8, 9, 9, 7, 15, 15, 2, 20, 13, 2, 24, 6, 11, 7, 12, 4, 10, 18,
13, 23, 11, 3, 11, 12, 10, 4, 5, 4, 22, 6, 3, 19, 14, 21, 11, 1,
5, 14, 8, 0, 1, 16, 5, 10, 13, 17, 1, 16, 17, 12, 6, 10, 0, 3, 9,
9, 3, 7, 7, 6, 6, 7, 5, 14, 18, 12, 19, 2, 8, 9, 0, 8, 4, 5]
del_dups(lst)
print(lst)
# -> [8, 9, 7, 15, 2, 20, 13, 24, 6, 11, 12, 4, 10, 18, 23, 3, 5, 22, 19, 14,
# 21, 1, 0, 16, 17]
Dictionary lookup is slightly faster then the set's one in Python 3.
What's going to be fastest depends on what percentage of your list is duplicates. If it's nearly all duplicates, with few unique items, creating a new list will probably be faster. If it's mostly unique items, removing them from the original list (or a copy) will be faster.
Here's one for modifying the list in place:
def unique(items):
seen = set()
for i in xrange(len(items)-1, -1, -1):
it = items[i]
if it in seen:
del items[i]
else:
seen.add(it)
Iterating backwards over the indices ensures that removing items doesn't affect the iteration.
This is the fastest in-place method I've found (assuming a large proportion of duplicates):
def unique(l):
s = set(); n = 0
for x in l:
if x not in s: s.add(x); l[n] = x; n += 1
del l[n:]
This is 10% faster than Allen's implementation, on which it is based (timed with timeit.repeat, JIT compiled by psyco). It keeps the first instance of any duplicate.
repton-infinity: I'd be interested if you could confirm my timings.
Obligatory generator-based variation:
def unique(seq):
seen = set()
for x in seq:
if x not in seen:
seen.add(x)
yield x
This may be the simplest way:
list(OrderedDict.fromkeys(iterable))
As of Python 3.5, OrderedDict is now implemented in C, so this was is now the shortest, cleanest, and fastest.
Taken from http://www.peterbe.com/plog/uniqifiers-benchmark
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
One-liner:
new_list = reduce(lambda x,y: x+[y][:1-int(y in x)], my_list, [])
An in-place one-liner for this:
>>> x = [1, 1, 2, 'a', 'a', 3]
>>> [ item for pos,item in enumerate(x) if x.index(item)==pos ]
[1, 2, 'a', 3]
This is the fastest one, comparing all the stuff from this lengthy discussion and the other answers given here, refering to this benchmark. It's another 25% faster than the fastest function from the discussion, f8. Thanks to David Kirby for the idea.
def uniquify(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if x not in seen and not seen_add(x)]
Some time comparison:
$ python uniqifiers_benchmark.py
* f8_original 3.76
* uniquify 3.0
* terhorst 5.44
* terhorst_localref 4.08
* del_dups 4.76
You can actually do something really cool in Python to solve this. You can create a list comprehension that would reference itself as it is being built. As follows:
# remove duplicates...
def unique(my_list):
return [x for x in my_list if x not in locals()['_[1]'].__self__]
Edit: I removed the "self", and it works on Mac OS X, Python 2.5.1.
The _[1] is Python's "secret" reference to the new list. The above, of course, is a little messy, but you could adapt it fit your needs as necessary. For example, you can actually write a function that returns a reference to the comprehension; it would look more like:
return [x for x in my_list if x not in this_list()]
Do the duplicates necessarily need to be in the list in the first place? There's no overhead as far as looking the elements up, but there is a little bit more overhead in adding elements (though the overhead should be O(1) ).
>>> x = []
>>> y = set()
>>> def add_to_x(val):
... if val not in y:
... x.append(val)
... y.add(val)
... print x
... print y
...
>>> add_to_x(1)
[1]
set([1])
>>> add_to_x(1)
[1]
set([1])
>>> add_to_x(1)
[1]
set([1])
>>>
Remove duplicates and preserve order:
This is a fast 2-liner that leverages built-in functionality of list comprehensions and dicts.
x = [1, 1, 2, 'a', 'a', 3]
tmpUniq = {} # temp variable used below
results = [tmpUniq.setdefault(i,i) for i in x if i not in tmpUniq]
print results
[1, 2, 'a', 3]
The dict.setdefaults() function returns the value as well as adding it to the temp dict directly in the list comprehension. Using the built-in functions and the hashes of the dict will work to maximize efficiency for the process.
O(n) if dict is hash, O(nlogn) if dict is tree, and simple, fixed. Thanks to Matthew for the suggestion. Sorry I don't know the underlying types.
def unique(x):
output = []
y = {}
for item in x:
y[item] = ""
for item in x:
if item in y:
output.append(item)
return output
has_key in python is O(1). Insertion and retrieval from a hash is also O(1). Loops through n items twice, so O(n).
def unique(list):
s = {}
output = []
for x in list:
count = 1
if(s.has_key(x)):
count = s[x] + 1
s[x] = count
for x in list:
count = s[x]
if(count > 0):
s[x] = 0
output.append(x)
return output
There are some great, efficient solutions here. However, for anyone not concerned with the absolute most efficient O(n) solution, I'd go with the simple one-liner O(n^2*log(n)) solution:
def unique(xs):
return sorted(set(xs), key=lambda x: xs.index(x))
or the more efficient two-liner O(n*log(n)) solution:
def unique(xs):
positions = dict((e,pos) for pos,e in reversed(list(enumerate(xs))))
return sorted(set(xs), key=lambda x: positions[x])
Here are two recipes from the itertools documentation:
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
def unique_justseen(iterable, key=None):
"List unique elements, preserving order. Remember only the element just seen."
# unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
# unique_justseen('ABBCcAD', str.lower) --> A B C A D
return imap(next, imap(itemgetter(1), groupby(iterable, key)))
I have no experience with python, but an algorithm would be to sort the list, then remove duplicates (by comparing to previous items in the list), and finally find the position in the new list by comparing with the old list.
Longer answer: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560
>>> def unique(list):
... y = []
... for x in list:
... if x not in y:
... y.append(x)
... return y
If you take out the empty list from the call to set() in Terhost's answer, you get a little speed boost.
Change:
found = set([])
to:
found = set()
However, you don't need the set at all.
def unique(items):
keep = []
for item in items:
if item not in keep:
keep.append(item)
return keep
Using timeit I got these results:
with set([]) -- 4.97210427363
with set() -- 4.65712377445
with no set -- 3.44865284975
x = [] # Your list of items that includes Duplicates
# Assuming that your list contains items of only immutable data types
dict_x = {}
dict_x = {item : item for i, item in enumerate(x) if item not in dict_x.keys()}
# Average t.c. = O(n)* O(1) ; furthermore the dict comphrehension and generator like behaviour of enumerate adds a certain efficiency and pythonic feel to it.
x = dict_x.keys() # if you want your output in list format
>>> x=[1,1,2,'a','a',3]
>>> y = [ _x for _x in x if not _x in locals()['_[1]'] ]
>>> y
[1, 2, 'a', 3]
"locals()['_[1]']" is the "secret name" of the list being created.
I don't know if this one is fast or not, but at least it is simple.
Simply, convert it first to a set and then again to a list
def unique(container):
return list(set(container))
One pass.
a = [1,1,'a','b','c','c']
new_list = []
prev = None
while 1:
try:
i = a.pop(0)
if i != prev:
new_list.append(i)
prev = i
except IndexError:
break
I haven't done any tests, but one possible algorithm might be to create a second list, and iterate through the first list. If an item is not in the second list, add it to the second list.
x = [1, 1, 2, 'a', 'a', 3]
y = []
for each in x:
if each not in y:
y.append(each)
a=[1,2,3,4,5,7,7,8,8,9,9,3,45]
def unique(l):
ids={}
for item in l:
if not ids.has_key(item):
ids[item]=item
return ids.keys()
print a
print unique(a)
Inserting elements will take theta(n)
retrieving if element is exiting or not will take constant time
testing all the items will take also theta(n)
so we can see that this solution will take theta(n).
Bear in mind that dictionary in python implemented by hash table.

Categories

Resources