help(random.sample)
says "The resulting list is in selection order so that all sub-slices will also be valid random samples"
What does selection order mean? If there were no requirement for selection order, how would resulting list look like? How could sub-slice not be a valid random sample?
Upd As far as I understood, it means that results will not be sorted in any way probably.
random.sample(population, k)
Given a population sequence it returns a list of length k with elements chosen (or selected) from population. Selection Order refers to order in which each of the elements are selected (random). The list is thus not sorted by indexes in population but by how the selection was made. Thus any-subslice of returned list is also a random sample for the population.
Example -
>>> import random
>>> population=[1,2,3,4,5,6,7,8,9,10,11,12,]
>>> ls=random.sample(population,5)
>>> ls
[1, 11, 7, 12, 6]
The returned list has elements in the order they were selected. So you can use sub-slicing on ls and not lose randomness
>>> ls[:3]
[1, 11, 7]
If selection ordering was not enforced, you could have ls look like
[1,6,7,11,12]
The sub-slice would then not be completely random but constrained by the length of slice. E.g. The greatest value cannot occur in a sub-slice of length 3 (In this case that would be [1, 6, 7])
The full help string is:
sample(self, population, k) method of random.Random instance
Chooses k unique random elements from a population sequence.
Returns a new list containing elements from the population while
leaving the original population unchanged. The resulting list is
in selection order so that all sub-slices will also be valid random
samples. This allows raffle winners (the sample) to be partitioned
into grand prize and second place winners (the subslices).
Members of the population need not be hashable or unique. If the
population contains repeats, then each occurrence is a possible
selection in the sample.
To choose a sample in a range of integers, use xrange as an argument.
This is especially fast and space efficient for sampling from a
large population: sample(xrange(10000000), 60)
So taking the example of a raffle; all the tickets rolling around inside the drum are the population, and k is the number of tickets drawn. The set of all the tickets drawn is the result of the random sample.
The sample is not sorted, nor altered in any way, it is in the order it is drawn. If you imagine that you went to a raffle, and they drew 100 tickets first, and discarded them, and then started drawing the actual tickets, the set of winning tickets would still be a random sample of the population. This is equivalent to taking slices of the first larger sample.
What it's saying, is that any sub slice of any sample, is still a valid random sample.
To answer your questions;
selection order is just the order in which the values are drawn to make up the sample.
without ensuring selection order the sample may be sorted somehow.
The following code you can imagine creating a random sample ensuring selection order:
def sample(population, k):
sample = []
popsize = len(population)-1
while len(sample) <= k:
r = population[random.randint(0, popsize]
if r not in sample:
sample.append(r)
return sample
Related
I tried to find the max 3 values in the list for implementing my knn model.
While trying to do so, I did it using the method that was intuitive to me the code was something as follows
`
first_k = X_train['distance'].sort_values().head(k)
prediction = first_k.value_counts().idxmax()
`
The first_k list contains the first k elements from the sorted values of the distance column. Prediction is what the model will return at last.
Another approach I found on the internet was this
`
prediction = y_train[X_train["distance"].nsmallest(n=k).index].mode()[0]
`
The second approach yields the correct results and my approach did not work as intended. Can someone explain to me why my approach did not work.
The difference is in the usage of .index after the method nsmallest(n=k) in the alternative approach. What you are doing in your code is the following:
Sort X using distance as sorting key, then take the first k elements in the sorted dataset
Check the distance frequency and the the first occurrence of the most frequent distance
The alternative approach instead does the following steps:
Recover the k smallest elements in the distance column
Get the corresponding index value of the rows recovered in the previous step (for example with k=5 it could be an element that when printed shows something similar to Int64Index([3, 9, 10, 1, 8], dtype='int64')
Recover in y the labels with the same index values of the ones recovered in the previous step
Get the most frequent label in y (or the mode)
So, as you can see, the main difference is the fact that the most frequent distance is not necessarily the most frequent class among the K neighbours that you have recovered.
Anyway you code can be easily fixed:
first_k = X_train['distance'].sort_values().head(k).index
prediction = y_train[first_k].mode()[0]
How to randomly partition given array with given bin sizes?
Is there an inbuilt function for that? For example, I want something like
function(12,(2,3,3,2,2)) to output four partitions of numbers from 1 go 12 (or 0 to 11, doesn't matter). So output may be a list like [[3,4],[7,8,11],[12,1,2],[5,9],[6,10]](or some other efficient data structure). The first argument of the function may be just a number n, in which case it will consider np.arange(n) as the input, otherwise it may be any other ndarray.
Of course we can randomly permute the list and then pick the first 2, next 3, next 3, next 2 and last 2 elements. But does there exist something more efficient?
numpy.partition() function has a different meaning, it performs a step in quicksort, and I also couldn't find any such function in the numpy.random submodule.
Try this following solution:
def func(a, b:List):
# a is integer and b is a python list
indx = np.random.rand(a).argsort() # Get randomly arranged index
b = np.array(b)
return np.r_[np.split(indx,b.cumsum()[:-1])] # split the index and merge
I want to emulate the functionality of random.sample() in python, but with a non-uniform (triangular, in this case) distribution of choices. Important for this is that a single item is not chosen twice (as described in the random.sample docs). Here's what I have:
...
def tri_sample(population, k, mode=0):
"""
Mimics the functionality of random.sample() but with a triangular
distribution over the length of the sequence.
Mode defaults to 0, which favors lower indices.
"""
psize = len(population)
if k > psize:
raise ValueError("k must be less than the number of items in population.")
if mode > psize:
raise ValueError("mode must be less than the number of items in population.")
indices_chosen = []
sample = []
for i in range(k):
# This ensures unique selections
while True:
choice = math.floor(random.triangular(0, psize, mode))
if choice not in indices_chosen:
break
indices_chosen.append(choice)
sample.append(population[choice])
return sample
...
My suspicion is that this is not an ideal way of preventing duplicate items being pulled. My first thought when designing this was to make a duplicate of population and .pop() the items as they're sampled to prevent choosing the same item twice, but I saw two problems with that:
If population is a list of objects, there could be some difficulty with duplicating the list while still ensuring that the items in sample point to the same objects in population.
Using .pop() on the population would change the size of the population, altering the distribution each time. Ideally, the distribution (not sure if I'm using the term correctly--the probability of each item being called) would be the same no matter what order the items are chosen in.
Is there a more efficient way of taking a non-uniform random sample from a population?
You can achieve what you want by using numpy.random.choice
The input to this function is as follows:
numpy.random.choice(a, size=None, replace=True, p=None)
so you could specify the weight vector p to be your desired probability distribution, and also choose replace=False, so that samples would not be repeated.
Alternatively, you could sample directly from the triangular distribution using numpy.random.triangular. You can do that in a loop, and add the new result to the list only if it did not appear there before.
I am fairly new to Python and I am stuck on a particular question and I thought i'd ask you guys.
The following contains my code so far, aswell as the questions that lie therein:
list=[100,20,30,40 etc...]
Just a list with different numeric values representing an objects weight in grams.
object=0
while len(list)>0:
list_caluclation=list.pop(0)
print(object number:",(object),"evaluates to")
What i want to do next is evaluate the items in the list. So that if we go with index[0], we have a list value of 100. THen i want to separate this into smaller pieces like, for a 100 gram object, one would split it into five 20 gram units. If the value being split up was 35, then it would be one 20 gram unit, on 10 gram unit and one 5 gram unit.
The five units i want to split into are: 20, 10, 5, 1 and 0.5.
If anyone has a quick tip regarding my issue, it would be much appreciated.
Regards
You should think about solving this for a single number first. So what you essentially want to do is split up a number into a partition of known components. This is also known as the Change-making problem. You can choose a greedy algorithm for this that always takes the largest component size as long as it’s still possible:
units = [20, 10, 5, 1, 0.5]
def change (number):
counts = {}
for unit in units:
count, number = divmod(number, unit)
counts[unit] = count
return counts
So this will return a dictionary that maps from each unit to the count of that unit required to get to the target number.
You just need to call that function for each item in your original list.
One way you could do it with a double for loop. The outer loop would be the numbers you input and the inner loop would be the values you want to evaluate (ie [20,10,5,1,0.5]). For each iteration of the inner loop, find how many times the value goes into the number (using the floor method), and then use the modulo operator to reassign the number to be the remainder. On each loop you can have it print out the info that you want :) Im not sure exactly what kind of output you're looking for, but I hope this helps!
Ex:
import math
myList=[100,20,30,40,35]
values=[20,10,5,1,0.5]
for i in myList:
print(str(i)+" evaluates to: ")
for num in values:
evaluation=math.floor(i/num)
print("\t"+str(num)+"'s: "+str(evaluation))
i%=num
There is a list with float values, which can differ or not. How can I find the randomly chosen list-index of one of the highest values in this list?
If the context is interesting to you:
I try to write a solver for the pen&paper game Battleship. I attempt to calculate probabilities for a hit on each of the fields and then want the the solver to shoot at one of the most likely spots, which means retrieving the index of the highest likelyhood in my likelyhood-list and then tell the game engine this index as my choice. Already the first move shows that it can happen, that there are a lot of fields with the same likelyhood. In this case it makes sense to choose one of them at random (and not just take always the first or anything like that).
Find the maximum using How to find all positions of the maximum value in a list?. Then pick a random from the list using random.choice.
>>> m = max(a)
>>> max_pos = [i for i, j in enumerate(a) if j == m]
>>> random.choice(max_pos)