Related
I need to pick out "x" number of non-repeating, random numbers out of a list. For example:
all_data = [1, 2, 2, 3, 4, 5, 6, 7, 8, 8, 9, 10, 11, 11, 12, 13, 14, 15, 15]
How do I pick out a list like [2, 11, 15] and not [3, 8, 8]?
That's exactly what random.sample() does.
>>> random.sample(range(1, 16), 3)
[11, 10, 2]
Edit: I'm almost certain this is not what you asked, but I was pushed to include this comment: If the population you want to take samples from contains duplicates, you have to remove them first:
population = [1, 2, 3, 4, 5, 6, 5, 4, 3, 2, 1]
population = list(set(population))
samples = random.sample(population, 3)
Something like this:
all_data = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
from random import shuffle
shuffle(all_data)
res = all_data[:3]# or any other number of items
OR:
from random import sample
number_of_items = 4
sample(all_data, number_of_items)
If all_data could contains duplicate entries than modify your code to remove duplicates first and then use shuffle or sample:
all_data = list(set(all_data))
shuffle(all_data)
res = all_data[:3]# or any other number of items
Others have suggested that you use random.sample. While this is a valid suggestion, there is one subtlety that everyone has ignored:
If the population contains repeats,
then each occurrence is a possible
selection in the sample.
Thus, you need to turn your list into a set, to avoid repeated values:
import random
L = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15]
random.sample(set(L), x) # where x is the number of samples that you want
Another way, of course with all the solutions you have to be sure that there are at least 3 unique values in the original list. all_data = [1,2,2,3,4,5,6,7,8,8,9,10,11,11,12,13,14,15,15]
choices = []
while len(choices) < 3:
selection = random.choice(all_data)
if selection not in choices:
choices.append(selection)
print choices
You can also generate a list of random choices, using itertools.combinations and random.shuffle.
all_data = [1,2,2,3,4,5,6,7,8,8,9,10,11,11,12,13,14,15,15]
# Remove duplicates
unique_data = set(all_data)
# Generate a list of combinations of three elements
list_of_three = list(itertools.combinations(unique_data, 3))
# Shuffle the list of combinations of three elements
random.shuffle(list_of_three)
Output:
[(2, 5, 15), (11, 13, 15), (3, 10, 15), (1, 6, 9), (1, 7, 8), ...]
import random
fruits_in_store = ['apple','mango','orange','pineapple','fig','grapes','guava','litchi','almond']
print('items available in store :')
print(fruits_in_store)
my_cart = []
for i in range(4):
#selecting a random index
temp = int(random.random()*len(fruits_in_store))
# adding element at random index to new list
my_cart.append(fruits_in_store[temp])
# removing the add element from original list
fruits_in_store.pop(temp)
print('items successfully added to cart:')
print(my_cart)
Output:
items available in store :
['apple', 'mango', 'orange', 'pineapple', 'fig', 'grapes', 'guava', 'litchi', 'almond']
items successfully added to cart:
['orange', 'pineapple', 'mango', 'almond']
If the data being repeated implies that we are more likely to draw that particular data, we can't turn it into a set right away (since we would loose that information by doing so). For this, we need to pick samples one by one and verify the size of the set that we generate has reached x (the number of samples that we want). Something like:
data=[0, 1, 2, 3, 4, 4, 4, 4, 5, 5, 6, 6]
x=3
res=set()
while(len(res)<x):
res.add(np.random.choice(data))
print(res)
some outputs :
{3, 4, 5}
{3, 5, 6}
{0, 4, 5}
{2, 4, 5}
As we can see 4 or 5 appear more frequently (I know 4 examples is not good enough statistics).
I am trying to solve a assignment where are 13 lights and starting from 1, light is turned off at every 5th light, when the count reaches 13, start from 1st item again. The function should return the order of lights turned off. In this case, for a list of 13 items, the return list would be [5, 10, 2, 8, 1, 9, 4, 13, 12, 3, 7, 11, 6]. Also, turned off lights would not count again.
So the way I was going to approach this problem was to have a list named turnedon, which is [1,2,3,4,5,6,7,8,9,10,11,12,13] and an empty list called orderoff and append to this list whenever a light gets turned off in the turnedon list. So while the turnedon is not empty, iterate through the turnedon list and append the light getting turned off and remove that turnedoff light from the turnedon list, if that makes sense. I cannot figure out what should go into the while loop though. Any idea would be really appreciated.
def orderoff():
n=13
turnedon=[]
for n in range(1,n+1):
turnedon.append(n)
orderoff=[]
while turneon !=[]:
This problem is equivalent to the well-known Josephus problem, in which n prisoners stand in a circle, and they are killed in a sequence where each time, the next person to be killed is k steps around the circle from the previous person; the steps are only counted over the remaining prisoners. A sample solution in Python can be found on the Rosetta code website, which I've adapted slightly below:
def josephus(n, k):
p = list(range(1, n+1))
i = 0
seq = []
while p:
i = (i+k-1) % len(p)
seq.append(p.pop(i))
return seq
Example:
>>> josephus(13, 5)
[5, 10, 2, 8, 1, 9, 4, 13, 12, 3, 7, 11, 6]
This works, but the results are different from yours:
>>> pos = 0
>>> result = []
>>> while len(result) < 13 :
... pos += 5
... pos %= 13
... if pos not in result :
... result.append(pos)
...
>>> result = [i+1 for i in result] # make it 1-based, not 0-based
>>> result
[6, 11, 3, 8, 13, 5, 10, 2, 7, 12, 4, 9, 1]
>>>
I think a more optimal solution would be to use a loop, add the displacement each time, and use modules to keep the number in range
def orderoff(lights_num,step):
turnd_off=[]
num =0
for i in range(max):
num =((num+step-1)%lights_num)+1
turnd_off.append(num)
return turnd_off
print(orderoff(13))
I want to print the top 10 distinct elements from a list:
top=10
test=[1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
for i in range(0,top):
if test[i]==1:
top=top+1
else:
print(test[i])
It is printing:
2,3,4,5,6,7,8
I am expecting:
2,3,4,5,6,7,8,9,10,11
What I am missing?
Using numpy
import numpy as np
top=10
test=[1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
test=np.unique(np.array(test))
test[test!=1][:top]
Output
array([ 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Since you code only executes the loop for 10 times and the first 3 are used to ignore 1, so only the following 3 is printed, which is exactly happened here.
If you want to print the top 10 distinct value, I recommand you to do this:
# The code of unique is taken from [remove duplicates in list](https://stackoverflow.com/questions/7961363/removing-duplicates-in-lists)
def unique(l):
return list(set(l))
def print_top_unique(List, top):
ulist = unique(List)
for i in range(0, top):
print(ulist[i])
print_top_unique([1, 1, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13], 10)
My Solution
test = [1,1,1,2,3,4,5,6,7,8,9,10,11,12,13]
uniqueList = [num for num in set(test)] #creates a list of unique characters [1,2,3,4,5,6,7,8,9,10,11,12,13]
for num in range(0,11):
if uniqueList[num] != 1: #skips one, since you wanted to start with two
print(uniqueList[num])
I have an assignment to add a value to a sorted list using list comprehension. I'm not allowed to import modules, only list comprehension, preferably a one liner. I'm not allowed to create functions and use them aswell.
I'm completely in the dark with this problem. Hopefully someone can help :)
Edit: I don't need to mutate the current list. In fact, I'm trying my solution right now with .pop, I need to create a new list with the element properly added, but I still can't figure out much.
try:
sorted_a.insert(next(i for i,lhs,rhs in enumerate(zip(sorted_a,sorted_a[1:])) if lhs <= value <= rhs),value)
except StopIteration:
sorted_a.append(value)
I guess ....
with your new problem statement
[x for x in sorted_a if x <= value] + [value,] + [y for y in sorted_a if y >= value]
you could certainly improve the big-O complexity
For bisecting the list, you may use bisect.bisect (for other readers referencing the answer in future) as:
>>> from bisect import bisect
>>> my_list = [2, 4, 6, 9, 10, 15, 18, 20]
>>> num = 12
>>> index = bisect(my_list, num)
>>> my_list[:index]+[num] + my_list[index:]
[2, 4, 6, 9, 10, 12, 15, 18, 20]
But since you can not import libraries, you may use sum and zip with list comprehension expression as:
>>> my_list = [2, 4, 6, 9, 10, 15, 18, 20]
>>> num = 12
>>> sum([[i, num] if i<num<j else [i] for i, j in zip(my_list,my_list[1:])], [])
[2, 4, 6, 9, 10, 12, 15, 18]
This question already has answers here:
How do I remove duplicates from a list, while preserving order?
(31 answers)
Closed 8 years ago.
For example:
>>> x = [1, 1, 2, 'a', 'a', 3]
>>> unique(x)
[1, 2, 'a', 3]
Assume list elements are hashable.
Clarification: The result should keep the first duplicate in the list. For example, [1, 2, 3, 2, 3, 1] becomes [1, 2, 3].
def unique(items):
found = set()
keep = []
for item in items:
if item not in found:
found.add(item)
keep.append(item)
return keep
print unique([1, 1, 2, 'a', 'a', 3])
Using:
lst = [8, 8, 9, 9, 7, 15, 15, 2, 20, 13, 2, 24, 6, 11, 7, 12, 4, 10, 18, 13, 23, 11, 3, 11, 12, 10, 4, 5, 4, 22, 6, 3, 19, 14, 21, 11, 1, 5, 14, 8, 0, 1, 16, 5, 10, 13, 17, 1, 16, 17, 12, 6, 10, 0, 3, 9, 9, 3, 7, 7, 6, 6, 7, 5, 14, 18, 12, 19, 2, 8, 9, 0, 8, 4, 5]
And using the timeit module:
$ python -m timeit -s 'import uniquetest' 'uniquetest.etchasketch(uniquetest.lst)'
And so on for the various other functions (which I named after their posters), I have the following results (on my first generation Intel MacBook Pro):
Allen: 14.6 µs per loop [1]
Terhorst: 26.6 µs per loop
Tarle: 44.7 µs per loop
ctcherry: 44.8 µs per loop
Etchasketch 1 (short): 64.6 µs per loop
Schinckel: 65.0 µs per loop
Etchasketch 2: 71.6 µs per loop
Little: 89.4 µs per loop
Tyler: 179.0 µs per loop
[1] Note that Allen modifies the list in place – I believe this has skewed the time, in that the timeit module runs the code 100000 times and 99999 of them are with the dupe-less list.
Summary: Straight-forward implementation with sets wins over confusing one-liners :-)
Update: on Python3.7+:
>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']
old answer:
Here is the fastest solution so far (for the following input):
def del_dups(seq):
seen = {}
pos = 0
for item in seq:
if item not in seen:
seen[item] = True
seq[pos] = item
pos += 1
del seq[pos:]
lst = [8, 8, 9, 9, 7, 15, 15, 2, 20, 13, 2, 24, 6, 11, 7, 12, 4, 10, 18,
13, 23, 11, 3, 11, 12, 10, 4, 5, 4, 22, 6, 3, 19, 14, 21, 11, 1,
5, 14, 8, 0, 1, 16, 5, 10, 13, 17, 1, 16, 17, 12, 6, 10, 0, 3, 9,
9, 3, 7, 7, 6, 6, 7, 5, 14, 18, 12, 19, 2, 8, 9, 0, 8, 4, 5]
del_dups(lst)
print(lst)
# -> [8, 9, 7, 15, 2, 20, 13, 24, 6, 11, 12, 4, 10, 18, 23, 3, 5, 22, 19, 14,
# 21, 1, 0, 16, 17]
Dictionary lookup is slightly faster then the set's one in Python 3.
What's going to be fastest depends on what percentage of your list is duplicates. If it's nearly all duplicates, with few unique items, creating a new list will probably be faster. If it's mostly unique items, removing them from the original list (or a copy) will be faster.
Here's one for modifying the list in place:
def unique(items):
seen = set()
for i in xrange(len(items)-1, -1, -1):
it = items[i]
if it in seen:
del items[i]
else:
seen.add(it)
Iterating backwards over the indices ensures that removing items doesn't affect the iteration.
This is the fastest in-place method I've found (assuming a large proportion of duplicates):
def unique(l):
s = set(); n = 0
for x in l:
if x not in s: s.add(x); l[n] = x; n += 1
del l[n:]
This is 10% faster than Allen's implementation, on which it is based (timed with timeit.repeat, JIT compiled by psyco). It keeps the first instance of any duplicate.
repton-infinity: I'd be interested if you could confirm my timings.
Obligatory generator-based variation:
def unique(seq):
seen = set()
for x in seq:
if x not in seen:
seen.add(x)
yield x
This may be the simplest way:
list(OrderedDict.fromkeys(iterable))
As of Python 3.5, OrderedDict is now implemented in C, so this was is now the shortest, cleanest, and fastest.
Taken from http://www.peterbe.com/plog/uniqifiers-benchmark
def f5(seq, idfun=None):
# order preserving
if idfun is None:
def idfun(x): return x
seen = {}
result = []
for item in seq:
marker = idfun(item)
# in old Python versions:
# if seen.has_key(marker)
# but in new ones:
if marker in seen: continue
seen[marker] = 1
result.append(item)
return result
One-liner:
new_list = reduce(lambda x,y: x+[y][:1-int(y in x)], my_list, [])
An in-place one-liner for this:
>>> x = [1, 1, 2, 'a', 'a', 3]
>>> [ item for pos,item in enumerate(x) if x.index(item)==pos ]
[1, 2, 'a', 3]
This is the fastest one, comparing all the stuff from this lengthy discussion and the other answers given here, refering to this benchmark. It's another 25% faster than the fastest function from the discussion, f8. Thanks to David Kirby for the idea.
def uniquify(seq):
seen = set()
seen_add = seen.add
return [x for x in seq if x not in seen and not seen_add(x)]
Some time comparison:
$ python uniqifiers_benchmark.py
* f8_original 3.76
* uniquify 3.0
* terhorst 5.44
* terhorst_localref 4.08
* del_dups 4.76
You can actually do something really cool in Python to solve this. You can create a list comprehension that would reference itself as it is being built. As follows:
# remove duplicates...
def unique(my_list):
return [x for x in my_list if x not in locals()['_[1]'].__self__]
Edit: I removed the "self", and it works on Mac OS X, Python 2.5.1.
The _[1] is Python's "secret" reference to the new list. The above, of course, is a little messy, but you could adapt it fit your needs as necessary. For example, you can actually write a function that returns a reference to the comprehension; it would look more like:
return [x for x in my_list if x not in this_list()]
Do the duplicates necessarily need to be in the list in the first place? There's no overhead as far as looking the elements up, but there is a little bit more overhead in adding elements (though the overhead should be O(1) ).
>>> x = []
>>> y = set()
>>> def add_to_x(val):
... if val not in y:
... x.append(val)
... y.add(val)
... print x
... print y
...
>>> add_to_x(1)
[1]
set([1])
>>> add_to_x(1)
[1]
set([1])
>>> add_to_x(1)
[1]
set([1])
>>>
Remove duplicates and preserve order:
This is a fast 2-liner that leverages built-in functionality of list comprehensions and dicts.
x = [1, 1, 2, 'a', 'a', 3]
tmpUniq = {} # temp variable used below
results = [tmpUniq.setdefault(i,i) for i in x if i not in tmpUniq]
print results
[1, 2, 'a', 3]
The dict.setdefaults() function returns the value as well as adding it to the temp dict directly in the list comprehension. Using the built-in functions and the hashes of the dict will work to maximize efficiency for the process.
O(n) if dict is hash, O(nlogn) if dict is tree, and simple, fixed. Thanks to Matthew for the suggestion. Sorry I don't know the underlying types.
def unique(x):
output = []
y = {}
for item in x:
y[item] = ""
for item in x:
if item in y:
output.append(item)
return output
has_key in python is O(1). Insertion and retrieval from a hash is also O(1). Loops through n items twice, so O(n).
def unique(list):
s = {}
output = []
for x in list:
count = 1
if(s.has_key(x)):
count = s[x] + 1
s[x] = count
for x in list:
count = s[x]
if(count > 0):
s[x] = 0
output.append(x)
return output
There are some great, efficient solutions here. However, for anyone not concerned with the absolute most efficient O(n) solution, I'd go with the simple one-liner O(n^2*log(n)) solution:
def unique(xs):
return sorted(set(xs), key=lambda x: xs.index(x))
or the more efficient two-liner O(n*log(n)) solution:
def unique(xs):
positions = dict((e,pos) for pos,e in reversed(list(enumerate(xs))))
return sorted(set(xs), key=lambda x: positions[x])
Here are two recipes from the itertools documentation:
def unique_everseen(iterable, key=None):
"List unique elements, preserving order. Remember all elements ever seen."
# unique_everseen('AAAABBBCCDAABBB') --> A B C D
# unique_everseen('ABBCcAD', str.lower) --> A B C D
seen = set()
seen_add = seen.add
if key is None:
for element in ifilterfalse(seen.__contains__, iterable):
seen_add(element)
yield element
else:
for element in iterable:
k = key(element)
if k not in seen:
seen_add(k)
yield element
def unique_justseen(iterable, key=None):
"List unique elements, preserving order. Remember only the element just seen."
# unique_justseen('AAAABBBCCDAABBB') --> A B C D A B
# unique_justseen('ABBCcAD', str.lower) --> A B C A D
return imap(next, imap(itemgetter(1), groupby(iterable, key)))
I have no experience with python, but an algorithm would be to sort the list, then remove duplicates (by comparing to previous items in the list), and finally find the position in the new list by comparing with the old list.
Longer answer: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/52560
>>> def unique(list):
... y = []
... for x in list:
... if x not in y:
... y.append(x)
... return y
If you take out the empty list from the call to set() in Terhost's answer, you get a little speed boost.
Change:
found = set([])
to:
found = set()
However, you don't need the set at all.
def unique(items):
keep = []
for item in items:
if item not in keep:
keep.append(item)
return keep
Using timeit I got these results:
with set([]) -- 4.97210427363
with set() -- 4.65712377445
with no set -- 3.44865284975
x = [] # Your list of items that includes Duplicates
# Assuming that your list contains items of only immutable data types
dict_x = {}
dict_x = {item : item for i, item in enumerate(x) if item not in dict_x.keys()}
# Average t.c. = O(n)* O(1) ; furthermore the dict comphrehension and generator like behaviour of enumerate adds a certain efficiency and pythonic feel to it.
x = dict_x.keys() # if you want your output in list format
>>> x=[1,1,2,'a','a',3]
>>> y = [ _x for _x in x if not _x in locals()['_[1]'] ]
>>> y
[1, 2, 'a', 3]
"locals()['_[1]']" is the "secret name" of the list being created.
I don't know if this one is fast or not, but at least it is simple.
Simply, convert it first to a set and then again to a list
def unique(container):
return list(set(container))
One pass.
a = [1,1,'a','b','c','c']
new_list = []
prev = None
while 1:
try:
i = a.pop(0)
if i != prev:
new_list.append(i)
prev = i
except IndexError:
break
I haven't done any tests, but one possible algorithm might be to create a second list, and iterate through the first list. If an item is not in the second list, add it to the second list.
x = [1, 1, 2, 'a', 'a', 3]
y = []
for each in x:
if each not in y:
y.append(each)
a=[1,2,3,4,5,7,7,8,8,9,9,3,45]
def unique(l):
ids={}
for item in l:
if not ids.has_key(item):
ids[item]=item
return ids.keys()
print a
print unique(a)
Inserting elements will take theta(n)
retrieving if element is exiting or not will take constant time
testing all the items will take also theta(n)
so we can see that this solution will take theta(n).
Bear in mind that dictionary in python implemented by hash table.