varying degree of shuffling using random module python - python

I am using two architecture programs, with visual programming plugins (Grasshopper for Rhino and Dynamo for Revit - for those that know / are interested)
Grasshopper contains a function called 'Jitter' this will shuffle a list, however it has an input from 0.0 to 1.0 which controls the degree of shuffling - 0.0 results in no shuffling 1.0 produces a complete shuffle.
The second of the programs (Dynamo) does not contain this functionality. It contains a shuffle module (which contains a seed value) however it is a complete random shuffle.
Ultimately the goal is to produce a series of solid and glazed panels, but to produce a slight random effect (but avoiding large clumping of solid and glazed elements - hence I want a "light shuffle")
I have written a code which will calculate the number of glazed(True) and solid(False) values required and then evenly distribute True and False values based on the number of items and percent specified.
I have checked out the random module reference however I'm not familiar with the various distributions as described.
Could someone help out or point me in the right direction if an existing function would achieve this.
(I have cheated slightly by adding True False alternately to make up the correct number of items within the list - list3 is the final list, list2 contains the repeated module of true falses)
Many thanks
import math
import random
percent = 30
items = 42
def remainder():
remain = items % len(list2)
list3.append(True)
remain -= 1
while remain > 0 :
list3.append(False)
remain -= 1
return list3
#find module of repeating True and False values
list1 = ([True] + [False] * int((100/percent)-1))
#multiply this list to nearest multiple based on len(items)
list2 = list1 * int(items/(100/percent))
# make a copy of list2
list3 = list2[:]
#add alternating true and false to match len(list3) to len(items)
remainder()
#an example of a completely shuffled list - which is not desired
shuffled = random.sample(list3, k = len(list3))

Here is an approach based on this paper which proves a result about the mixing time needed to scramble a list by using swaps of adjacent items
from random import choice
from math import log
def jitter(items,percent):
n = len(items)
m = (n**2 * log(n))
items = items[:]
indices = list(range(n-1))
for i in range(int(percent*m)):
j = choice(indices)
items[j],items[j+1] = items[j+1],items[j]
return items
A test, each line showing the result of jitter with various percents being applied to the same list:
ls = list(('0'*20 + '1'*20)*2)
for i in range(11):
p = i/10.0
print(''.join(jitter(ls,p)))
Typical output:
00000000000000000000111111111111111111110000000000000000000011111111111111111111
00000000000000111100001101111011011111001010000100010001101000110110111111111111
00000000100100000101111110000110111101000001110001101001010101100011111111111110
00000001010010011011000100111010101100001111011100100000111010110111011001011111
00100001100000001101010000011010011011111011001100000111011011111011010101011101
00000000011101000110000110000010011001010110011111100100111101111011101100111110
00110000000001011001000010110011111101001111001001100101010011010111111011101100
01101100000100100110000011011000001101111111010100000100000110111011110011011111
01100010110100010100010100011000000001000101100011111011111011111011010100011111
10011100101000100010001100100000100111001111011011000100101101101010101101011111
10000000001000111101101011000011010010110011010101110011010100101101011110101110
I'm not sure how principled the above is, but it seems like a reasonable place to start.

There's no clear definition of what "degree of shuffling" (d) means, so you'll need to choose one. One option would be: "the fraction of items remaining unshuffled is (1-d)".
You could implement that as:
Produce a list of indices
Remove (1-d)*N of them
Shuffle the rest
Reinsert the ones removed
Use these to look up values from the original data
def partial_shuffle(x, d):
"""
x: data to shuffle
d: fraction of data to leave unshuffled
"""
n = len(x)
dn = int(d*n)
indices = list(range(n))
random.shuffle(indices)
ind_fixed, ind_shuff = indices[dn:], indices[:dn]
# copy across the fixed values
result = x[:]
# shuffle the shuffled values
for src, dest in zip(ind_shuff, sorted(ind_shuff)):
result[dest] = x[src]
return result

The other algorithms you're referring to are probably using the Fisher-Yates shuffle under the hood.
This O(n) shuffle starts with the first element of an array and swaps it with a random higher element, then swaps the second element with a random higher element, and so on.
Naturally, stopping this shuffle before you reach the last element at some fraction [0,1] would give a partially-randomized array, like you want.
Unfortunately, the effect of the foregoing is that all the "randomness" builds up on one side of the array.
Therefore, make a list of array indices, shuffle these completely, and then use the indices as an input to the Fisher-Yates algorithm to partially sort the original array.

I believe I found a more versatile, robust, and a consistent way to implement this "adjustable shuffling" technique.
import random
import numpy as np
def acc_shuffle(lis, sr, array=False, exc=None): # "sr" = shuffling rate
if type(lis) != list: # Make it compatible with shuffling (mxn) numpy.ndarrays
arr = lis
shape = arr.shape
lis = list(arr.reshape(-1))
lis = lis[:] # Done, such that any changes applied on "lis" wont affect original input list "x"
indices = list(range(len(lis)))
if exc is not None: # Exclude any indices if necessary
for ele in sorted(exc, reverse=True):
del indices[ele]
shuff_range = int(sr * len(lis) / 2) # How much to shuffle (depends on shuffling rate)
if shuff_range < 1:
shuff_range = 1 # "At least one shuffle (swap 2 elements)"
for _ in range(shuff_range):
i = random.choice(indices)
indices.remove(i) # You can opt not to remove the indices for more flexibility
j = random.choice(indices)
indices.remove(j)
temp = lis[i]
lis[i] = lis[j]
lis[j] = temp
if array is True:
return np.array(lis).reshape(shape)
return lis

Related

Efficient algorithm to randomly find available places inside a list in Python

I need to randomly assign a place inside a list to an input. I need to check whether it is not occupied first and then use it. The best algorithm that I can come up with is the following:
def get_random_addr(input_arr):
while True:
addr = random.randrange(1, len(input_arr))
if input_arr[addr] is None:
break
return addr
This is obviously not efficient since as we occupy more slots, the loop takes longer to find an empty slot, and even it may take forever (suppose only one empty slot is left). Do you have any better solutions?
How I did it
Based on the chosen answer, this is how I ended up doing it. It is very fast and efficient compared to the solutions which search through the whole list and find the None elements and randomly choose from the retrieved set. I think the bottleneck was random.choice method which seems to be very slow.
# Create a list of indexes at the beginning when all the values are None
available_index = list(range(1, len(input_arr)))
random.shuffle(available_index)
# To get a random index simply pop from shuffled available index
random_index = available_index.pop()
While this method has extra O(n) memory complexity, in practice it is very efficient and fast.
If you can't use numpy I'd keep a set of indexes which are known to contain None. Every time None is added or removed this set of indexes will be updated
Your function can take arbitrarly long to return. In particular, you will get into an infinite loop if no item is None.
Instead, recover all indices which are None and use random.choices to randomly return k of them.
import random
def get_random_addr(input_arr, k=1, target=None):
return random.choices([i for i, v in enumerate(input_arr) if v is target], k=k)
Usage
l = [0, None, 2, 3, None, None]
for i in get_random_addr(l, k=2):
l[i] = i
print(l) # [0, None, 2, 3, 4, 5]
Similar to DeepSpace's idea, except with O(1) memory and O(n) time, but faster by a constant factor since it only iterates over half the slots in the array.
Keep track of the number of empty slots.
Iterate through the list.
If a slot is empty, return your new value with probability 1/number_empty_slots
If we did not return and the slot is empty, redistribute probability mass over other empty slots
Code:
def get_random_addr(input_arr, num_empty_slots):
# num_empty_slots contains the number of empty slots in input_arr
for index, elem in enumerate(arr):
if elem is None:
if random.random() < 1 / num_empty_slots:
return index
num_empty_slots -= 1
Simply use enumerate to index your list first, filter out those that are None, and then use random.choice to pick an available space.
from random import choice
def get_random_addr(input_arr):
return choice([index for index, value in enumerate(input_arr) if value is None])
print(get_random_addr([None, 1, None, 2]))
This outputs either 0 or 2 randomly, or None if there is no more space available.
In my approach, I pick an arbitrary address in the target array, and if it is free I add it to the output list, but if it is not, I map that address to an address which does contain None, nearest to the end of the list. All entries in the array beyond and including this mapped free address are dropped from this list, since they are either nonempty, or already are represented elsewhere in the list. I repeat this process, chopping away at the size of the target list, making it easier and easier to find new empty addresses as it proceeds. There's a few other minor details to make it all work, but I think the code below can explain those better than I can with words.
from random import random
def randint(max_val):
return int(random() * max_val)
def assign(values, target):
output = []
mapping = dict()
mmax = 0
size = len(target)
for val in values:
idx = randint(size)
while target[idx] != None:
if idx in mapping:
idx = mapping.pop(idx)
mmax = max(mapping or [0])
break
min_size = max(idx, mmax)
try:
size -= target[size-1:min_size:-1].index(None)
except:
size = min_size + 1
if target[size-1] == None:
size -= 1
mapping[idx] = size
if idx > mmax:
mmax = idx
elif size-1 in mapping:
size -= 1
mapping[idx] = mapping.pop(size)
mmax = max(mapping or [0])
idx = randint(size)
target[idx] = val
output.append(idx)
return output
Note that this modifies the target list passed to it. If you don't want to modify it, you really have two options: implement a bit of extra logic to check if the "free" address is already consumed, or copy the entire list (in which case, reverse it and patch the indices, so that the .index() can work on the list directly, which is the major time sink anyways.
I'd also recommend verifying that the solutions it produces are valid. I've done some testing on my part, but I very well could have missed something.

What are the time/space differences between random.shuffle(x) and random.sample(x, len(x))? [duplicate]

I have a list a_tot with 1500 elements and I would like to divide this list into two lists in a random way. List a_1 would have 1300 and list a_2 would have 200 elements. My question is about the best way to randomize the original list with 1500 elements. When I have randomized the list, I could take one slice with 1300 and another slice with 200.
One way is to use the random.shuffle, another way is to use the random.sample. Any differences in the quality of the randomization between the two methods? The data in list 1 should be a random sample as well as the data in list2.
Any recommendations?
using shuffle:
random.shuffle(a_tot) #get a randomized list
a_1 = a_tot[0:1300] #pick the first 1300
a_2 = a_tot[1300:] #pick the last 200
using sample
new_t = random.sample(a_tot,len(a_tot)) #get a randomized list
a_1 = new_t[0:1300] #pick the first 1300
a_2 = new_t[1300:] #pick the last 200
The source for shuffle:
def shuffle(self, x, random=None, int=int):
"""x, random=random.random -> shuffle list x in place; return None.
Optional arg random is a 0-argument function returning a random
float in [0.0, 1.0); by default, the standard random.random.
"""
if random is None:
random = self.random
for i in reversed(xrange(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = int(random() * (i+1))
x[i], x[j] = x[j], x[i]
The source for sample:
def sample(self, population, k):
"""Chooses k unique random elements from a population sequence.
Returns a new list containing elements from the population while
leaving the original population unchanged. The resulting list is
in selection order so that all sub-slices will also be valid random
samples. This allows raffle winners (the sample) to be partitioned
into grand prize and second place winners (the subslices).
Members of the population need not be hashable or unique. If the
population contains repeats, then each occurrence is a possible
selection in the sample.
To choose a sample in a range of integers, use xrange as an argument.
This is especially fast and space efficient for sampling from a
large population: sample(xrange(10000000), 60)
"""
# XXX Although the documentation says `population` is "a sequence",
# XXX attempts are made to cater to any iterable with a __len__
# XXX method. This has had mixed success. Examples from both
# XXX sides: sets work fine, and should become officially supported;
# XXX dicts are much harder, and have failed in various subtle
# XXX ways across attempts. Support for mapping types should probably
# XXX be dropped (and users should pass mapping.keys() or .values()
# XXX explicitly).
# Sampling without replacement entails tracking either potential
# selections (the pool) in a list or previous selections in a set.
# When the number of selections is small compared to the
# population, then tracking selections is efficient, requiring
# only a small set and an occasional reselection. For
# a larger number of selections, the pool tracking method is
# preferred since the list takes less space than the
# set and it doesn't suffer from frequent reselections.
n = len(population)
if not 0 <= k <= n:
raise ValueError, "sample larger than population"
random = self.random
_int = int
result = [None] * k
setsize = 21 # size of a small set minus size of an empty list
if k > 5:
setsize += 4 ** _ceil(_log(k * 3, 4)) # table size for big sets
if n <= setsize or hasattr(population, "keys"):
# An n-length list is smaller than a k-length set, or this is a
# mapping type so the other algorithm wouldn't work.
pool = list(population)
for i in xrange(k): # invariant: non-selected at [0,n-i)
j = _int(random() * (n-i))
result[i] = pool[j]
pool[j] = pool[n-i-1] # move non-selected item into vacancy
else:
try:
selected = set()
selected_add = selected.add
for i in xrange(k):
j = _int(random() * n)
while j in selected:
j = _int(random() * n)
selected_add(j)
result[i] = population[j]
except (TypeError, KeyError): # handle (at least) sets
if isinstance(population, list):
raise
return self.sample(tuple(population), k)
return result
As you can see, in both cases, the randomization is essentially done by the line int(random() * n). So, the underlying algorithm is essentially the same.
There are two major differences between shuffle() and sample():
1) Shuffle will alter data in-place, so its input must be a mutable sequence. In contrast, sample produces a new list and its input can be much more varied (tuple, string, xrange, bytearray, set, etc).
2) Sample lets you potentially do less work (i.e. a partial shuffle).
It is interesting to show the conceptual relationships between the two by demonstrating that is would have been possible to implement shuffle() in terms of sample():
def shuffle(p):
p[:] = sample(p, len(p))
Or vice-versa, implementing sample() in terms of shuffle():
def sample(p, k):
p = list(p)
shuffle(p)
return p[:k]
Neither of these are as efficient at the real implementation of shuffle() and sample() but it does show their conceptual relationships.
The randomization should be just as good with both option. I'd say go with shuffle, because it's more immediately clear to the reader what it does.
random.shuffle() shuffles the given list in-place. Its length stays the same.
random.sample() picks n items out of the given sequence without replacement (which also might be a tuple or whatever, as long as it has a __len__()) and returns them in randomized order.
I think they are quite the same, except that one updated the original list, one use (read only) it. No differences in quality.
from random import shuffle
from random import sample
x = [[i] for i in range(10)]
shuffle(x)
sample(x,10)
shuffle update the output in same list but sample return the update
list sample provide the no of argument in pic facility but shuffle
provide the list of same length of input

Generating r-length permutations of list with repeated elements python

My problem is similar to the quesiton asked here. Differing from this question, I need an algorithm which generates r-tuple permutations of a given list with repeated elements.
On an example:
list1 = [1,1,1,2,2]
for i in permu(list1, 3):
print i
[1,1,1]
[1,1,2]
[1,2,1]
[2,1,1]
[1,2,2]
[2,1,2]
[2,2,1]
It seems that itertools.permutations will work fine here with adding a simple filtering to remove the repeated ones. However in my real cases, lists are much longer than this example and as you already know complexity of itertools.permutations increases exponential as the length of list increases.
So far, what I have is below. This code does the described job, but it is not efficient.
def generate_paths(paths, N = None):
groupdxs = [i for i, group in enumerate(paths) for _ in range(len(group))]
oldCombo = []
result = []
for dxCombo in itertools.permutations(groupdxs, N):
if dxCombo <= oldCombo: # as simple filter
continue
oldCombo = dxCombo
parNumbers = partialCombinations(dxCombo, len(paths))
if not parNumbers.count(0) >= len(paths)-1: # all of nodes are coming from same path, same graph
groupTemps = []
for groupInd in range(len(parNumbers)):
groupTemp = [x for x in itertools.combinations(paths[groupInd], parNumbers[groupInd])]
groupTemps.append(groupTemp)
for parGroups in itertools.product(*groupTemps):
iters = [iter(group) for group in parGroups]
p = [next(iters[i]) for i in dxCombo]
result.append(p)
return result
def partialCombinations(combo, numGruops):
tempCombo = list(combo)
result = list([0] * numGruops)
for x in tempCombo:
result[x] += 1
return result
In first for loop, I need to generate all possible r-length tuples which makes algorithm slower. There is a good solution for permutation without using r-length in above link. How can I adopt this algorithm to mine? Or is there any better ways?
I haven't thought this through very well for your case, but here's another approach.
Instead of giving large lists to permutations, you could give a small list that has no duplicates. You can use combinations_with_replacement to generate these smaller lists (you'll need to filter them to match the quantities of duplicates from your original input) and then get the permutations of each combination.
possible_values = (1,2)
n_positions = 3
sorted_combinations = itertools.combinations_with_replacement(possible_values, n_positions)
unique_permutations = set()
for combo in sorted_combinations:
# TODO: Do filtering for acceptable combinations before passing to permutations.
for p in itertools.permutations(combo):
unique_permutations.add(p)
print "len(unique_permutations) = %i. It should be %i^%i = %i.\nPermutations:" % (len(unique_permutations), len(possible_values), n_positions, pow(len(possible_values), n_positions))
for p in unique_permutations:
print p

Pick N distinct items at random from sequence of unknown length, in only one iteration

I am trying to write an algorithm that would pick N distinct items from an sequence at random, without knowing the size of the sequence in advance, and where it is expensive to iterate over the sequence more than once. For example, the elements of the sequence might be the lines of a huge file.
I have found a solution when N=1 (that is, "pick exactly one element at random from a huge sequence"):
import random
items = range(1, 10) # Imagine this is a huge sequence of unknown length
count = 1
selected = None
for item in items:
if random.random() * count < 1:
selected = item
count += 1
But how can I achieve the same thing for other values of N (say, N=3)?
If your sequence is short enough that reading it into memory and randomly sorting it is acceptable, then a straightforward approach would be to just use random.shuffle:
import random
arr=[1,2,3,4]
# In-place shuffle
random.shuffle(arr)
# Take the first 2 elements of the now randomized array
print arr[0:2]
[1, 3]
Depending upon the type of your sequence, you may need to convert it to a list by calling list(your_sequence) on it, but this will work regardless of the types of the objects in your sequence.
Naturally, if you can't fit your sequence into memory or the memory or CPU requirements of this approach are too high for you, you will need to use a different solution.
Use reservoir sampling. It's a very simple algorithm that works for any N.
Here is one Python implementation, and here is another.
Simplest I've found is this answer in SO, improved a bit below:
import random
my_list = [1, 2, 3, 4, 5]
how_big = 2
new_list = random.sample(my_list, how_big)
# To preserve the order of the list, you could do:
randIndex = random.sample(range(len(my_list)), how_big)
randIndex.sort()
new_list = [my_list[i] for i in randIndex]
If you have python version of 3.6+ you can use choices
from random import choices
items = range(1, 10)
new_items = choices(items, k = 3)
print(new_items)
[6, 3, 1]
#NPE is correct, but the implementations that are being linked to are sub-optimal and not very "pythonic". Here's a better implementation:
def sample(iterator, k):
"""
Samples k elements from an iterable object.
:param iterator: an object that is iterable
:param k: the number of items to sample
"""
# fill the reservoir to start
result = [next(iterator) for _ in range(k)]
n = k - 1
for item in iterator:
n += 1
s = random.randint(0, n)
if s < k:
result[s] = item
return result
Edit As #panda-34 pointed out the original version was flawed, but not because I was using randint vs randrange. The issue is that my initial value for n didn't account for the fact that randint is inclusive on both ends of the range. Taking this into account fixes the issue. (Note: you could also use randrange since it's inclusive on the minimum value and exclusive on the maximum value.)
Following will give you N random items from an array X
import random
list(map(lambda _: random.choice(X), range(N)))
It should be enough to accept or reject each new item just once, and, if you accept it, throw out a randomly chosen old item.
Suppose you have selected N items of K at random and you see a (K+1)th item. Accept it with probability N/(K+1) and its probabilities are OK. The current items got in with probability N/K, and get thrown out with probability (N/(K+1))(1/N) = 1/(K+1) so survive through with probability (N/K)(K/(K+1)) = N/(K+1) so their probabilities are OK too.
And yes I see somebody has pointed you to reservoir sampling - this is one explanation of how that works.
As aix mentioned reservoir sampling works. Another option is generate a random number for every number you see and select the top k numbers.
To do it iteratively, maintain a heap of k (random number, number) pairs and whenever you see a new number insert to the heap if it is greater than smallest value in the heap.
This was my answer to a duplicate question (closed before I could post) that was somewhat related ("generating random numbers without any duplicates"). Since, it is a different approach than the other answers, I'll leave it here in case it provides additional insight.
from random import randint
random_nums = []
N = # whatever number of random numbers you want
r = # lower bound of number range
R = # upper bound of number range
x = 0
while x < N:
random_num = randint(r, R) # inclusive range
if random_num in random_nums:
continue
else:
random_nums.append(random_num)
x += 1
The reason for the while loop over the for loop is that it allows for easier implementation of non-skipping in random generation (i.e. if you get 3 duplicates, you won't get N-3 numbers).
There's one implementation from the numpy library.
Assuming that N is smaller than the length of the array, you'd have to do the following:
# my_array is the array to be sampled from
assert N <= len(my_array)
indices = np.random.permutation(N) # Generates shuffled indices from 0 to N-1
sampled_array = my_array[indices]
If you need to sample the whole array and not just the first N positions, then you can use:
import random
sampled_array = my_array[random.sample(len(my_array), N)]

Very fast sampling from a set with fixed number of elements in python

I need to sample uniformly at random a number from a set with fixed size, do some calculation, and put the new number back into the set. (The number samples needed is very large)
I've tried to store the numbers in a list and use random.choice() to pick an element, remove it, and then append the new element. But that's way too slow!
I'm thinking to store the numbers in a numpy array, sample a list of indices, and for each index perform the calculation.
Are there any faster way of doing this process?
Python lists are implemented internally as arrays (like Java ArrayLists, C++ std::vectors, etc.), so removing an element from the middle is relatively slow: all subsequent elements have to be reindexed. (See http://www.laurentluce.com/posts/python-list-implementation/ for more on this.) Since the order of elements doesn't seem to be relevant to you, I'd recommend you just use random.randint(0, len(L) - 1) to choose an index i, then use L[i] = calculation(L[i]) to update the ith element.
I need to sample uniformly at random a number from a set with fixed
size, do some calculation, and put the new number back into the set.
s = list(someset) # store the set as a list
while 1:
i = randrange(len(s)) # choose a random element
x = s[i]
y = your_calculation(x) # do some calculation
s[i] = y # put the new number back into the set
random.sample(
a set or list or Numpy array, Nsample )
is very fast,
but it's not clear to me if you want anything like this:
import random
Setsize = 10000
Samplesize = 100
Max = 1 << 20
bigset = set( random.sample( xrange(Max), Setsize )) # initial subset of 0 .. Max
def calc( aset ):
return set( x + 1 for x in aset ) # << your code here
# sample, calc a new subset of bigset, add it --
for iter in range(3):
asample = random.sample( bigset, Samplesize )
newset = calc( asample ) # new subset of 0 .. Max
bigset |= newset
You could use Numpy arrays
or bitarray
instead of set, but I'd expect the time in calc() to dominate.
What are your Setsize and Samplesize, roughly ?

Categories

Resources