I'm currently trying to get an array of numbers like this one randomly shuffled:
label_array = np.repeat(np.arange(6), 12)
The only constrain is that no consecutive elements of the shuffle must be the same number. For that I'm currently using this code:
# Check if there are any occurrences of two consecutive
# elements being of the same category (same number)
num_occurrences = np.sum(np.diff(label_array) == 0)
# While there are any occurrences of this...
while num_occurrences != 0:
# ...shuffle the array...
np.random.shuffle(label_array)
# ...create a flag for occurrences...
flag = np.hstack(([False], np.diff(label_array) == 0))
flag_array = label_array[flag]
# ...and shuffle them.
np.random.shuffle(flag_array)
# Then re-assign them to the original array...
label_array[flag] = flag_array
# ...and check the number of occurrences again.
num_occurrences = np.sum(np.diff(label_array) == 0)
Although this works for an array of this size, I don't know if it would work for much bigger arrays. And even so, it may take a lot of time.
So, is there a better way of doing this?
May not be technically the best answer, hopefully it suffices for your requirements.
import numpy as np
def generate_random_array(block_length, block_count):
for blocks in range(0, block_count):
nums = np.arange(block_length)
np.random.shuffle(nums)
try:
if nums[0] == randoms_array [-1]:
nums[0], nums[-1] = nums[-1], nums[0]
except NameError:
randoms_array = []
randoms_array.extend(nums)
return randoms_array
generate_random_array(block_length=1000, block_count=1000)
Here is a way to do it, for Python >= 3.6, using random.choices, which allows to choose from a population with weights.
The idea is to generate the numbers one by one. Each time we generate a new number, we exclude the previous one by temporarily setting its weight to zero. Then, we decrement the weight of the chosen one.
As #roganjosh duly noted, we have a problem at the end when we are left with more than one instance of the last value - and that can be really frequent, especially with a small number of values and a large number of repeats.
The solution I used is to insert these value back into the list where they don't create a conflict, with the short send_back function.
import random
def send_back(value, number, lst):
idx = len(lst)-2
for _ in range(number):
while lst[idx] == value or lst[idx-1] == value:
idx -= 1
lst.insert(idx, value)
def shuffle_without_doubles(nb_values, repeats):
population = list(range(nb_values))
weights = [repeats] * nb_values
out = []
prev = None
for i in range(nb_values * repeats):
if prev is not None:
# remove prev from the list of possible choices
# by turning its weight temporarily to zero
old_weight = weights[prev]
weights[prev] = 0
try:
chosen = random.choices(population, weights)[0]
except IndexError:
# We are here because all of our weights are 0,
# which means that all is left to choose from
# is old_weight times the previous value
send_back(prev, old_weight, out)
break
out.append(chosen)
weights[chosen] -= 1
if prev is not None:
# restore weight
weights[prev] = old_weight
prev = chosen
return out
print(shuffle_without_doubles(6, 12))
[5, 1, 3, 4, 3, 2, 1, 5, 3, 5, 2, 0, 5, 4, 3, 4, 5,
3, 4, 0, 4, 1, 0, 1, 5, 3, 0, 2, 3, 4, 1, 2, 4, 1,
0, 2, 0, 2, 5, 0, 2, 1, 0, 5, 2, 0, 5, 0, 3, 2, 1,
2, 1, 5, 1, 3, 5, 4, 2, 4, 0, 4, 2, 4, 0, 1, 3, 4,
5, 3, 1, 3]
Some crude timing: it takes about 30 seconds to generate (shuffle_without_doubles(600, 1200)), so 720000 values.
I came from Creating a list without back-to-back repetitions from multiple repeating elements (referred as "problem A") as I organise my notes and there was no correct answer under "problem A" nor in the current one. Also these two problems seems different because problem A requires same elements.
Basically what you asked is same as an algorithm problem (link) where the randomness is not required. But when you have like almost half of all numbers same, the result can only be like "ABACADAEA...", where "ABCDE" are numbers. In the most voted answer to this problem, a priority queue is used so the time complexity is O(n log m), where n is the length of the output and m is the count of option.
As for this problem A easier way is to use itertools.permutations and randomly select some of them with different beginning and ending so it looks like "random"
I write draft code here and it works.
from itertools import permutations
from random import choice
def no_dup_shuffle(ele_count: int, repeat: int):
"""
Return a shuffle of `ele_count` elements repeating `repeat` times.
"""
p = permutations(range(ele_count))
res = []
curr = last = [-1] # -1 is a dummy value for the first `extend`
for _ in range(repeat):
while curr[0] == last[-1]:
curr = choice(list(p))
res.extend(curr)
last = curr
return res
def test_no_dup_shuffle(count, rep):
r = no_dup_shuffle(count, rep)
assert len(r) == count * rep # check result length
assert len(set(r)) == count # check all elements are used and in `range(count)`
for i, n in enumerate(r): # check no duplicate
assert n != r[i - 1]
print(r)
if __name__ == "__main__":
test_no_dup_shuffle(5, 3)
test_no_dup_shuffle(3, 17)
Related
for example, I got a list of tokens and each token's number of characters(length) is
length = [2, 1, 1, 2, 2, 3, 2, 1, 1, 2, 2, 2]
and here is the list of each token's probability of [not insert a linefeed, insert a linefeed] after the token
prob = [[9.9978e-01, 2.2339e-04], [9.9995e-01, 4.9344e-05], [0.9469, 0.0531],
[9.9994e-01, 5.8422e-05], [0.9964, 0.0036], [9.9991e-01, 9.4295e-05],
[9.9980e-01, 1.9620e-04], [1.0000e+00, 5.2492e-08], [9.9998e-01, 1.8293e-05],
[9.9999e-01, 5.1220e-06], [1.0000e+00, 3.9795e-06], [0.0142, 0.9858]]
and the result for the probabilies is
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1]
which means inserting a linefeed after the last token.
The whole length of this line is 21, and I would like to have a maximum of 20 characters per line.
In that case, I have to insert one (in this example, maybe more in other situations) more linefeed to make sure every line has 20 characters at most.
In this example, the best answer is
[0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1]
since the 3rd token gets the highest probability of inserting a linefeed.
My thought is to make all combinations of these probabilities.(Multiply them instead of adding) I got 12 tokens in this example, each token gets its 0-1 Classification Probability, so there are 2^12 kinds of combination. And I use the binary sequence to record every situation (since it's a 0-1 Classification Problem)and store them in a dictionary in format of [binary sequence, the combination of probabilities].
for i in range(nums):
num *= 2
numx = bin(num)
for i in range(num):
numx = bin(numx - 1)
str1 = numx.encode('ascii').decode('ascii')
str1 = str1.lstrip('0b')
probb = 1
for k in range(len(str1)):
x = str1[k]
if int(x) == 0: # [0, 1]
probb *= prob_task2[k][0]
else:
probb *= prob_task2[k][1]
dic[str1] = probb
Then I want to sort all kinds of combination, and search the possible result from high to low.
I make two loops for making all combinations. And another two loops for searching the combinations from top to low in order to meet the restriction of characters. But I got some troubles with the efficiency, since once there are 40 tokens, I have to count 2^40 kinds of combinations.
I am not good at algorithms, so I do want to ask is there an efficient way to solve the problem.
To rephrase, you have a list of tokens of given lengths, each with an
independent probability of being followed by a line break, and you want
to find the maximum likelihood outcome whose longest line doesn’t exceed
the given max.
There is an efficient dynamic program (O(n L) where n is the number of
tokens and L is the line length). The idea is that we can prevent the
search tree from blowing up exponentially by pruning the less likely
possibilities that have the same current line length. In Python:
import collections
import math
length = [2, 1, 1, 2, 2, 3, 2, 1, 1, 2, 2, 2]
prob = [
[9.9978e-1, 2.2339e-4],
[9.9995e-1, 4.9344e-5],
[0.9469, 0.0531],
[9.9994e-1, 5.8422e-5],
[0.9964, 0.0036],
[9.9991e-1, 9.4295e-5],
[9.998e-1, 1.962e-4],
[1.0e0, 5.2492e-8],
[9.9998e-1, 1.8293e-5],
[9.9999e-1, 5.122e-6],
[1.0e0, 3.9795e-6],
[0.0142, 0.9858],
]
max_line_length = 20
line_length_to_best = {length[0]: (0, [])}
for i, (p_no_break, p_break) in enumerate(prob[:-1]):
line_length_to_options = collections.defaultdict(list)
for line_length, (likelihood, breaks) in line_length_to_best.items():
length_without_break = line_length + length[i + 1]
if length_without_break <= max_line_length:
line_length_to_options[length_without_break].append(
(likelihood + math.log2(p_no_break), breaks + [0])
)
line_length_to_options[length[i + 1]].append(
(likelihood + math.log2(p_break), breaks + [1])
)
line_length_to_best = {
line_length: max(options)
for (line_length, options) in line_length_to_options.items()
}
_, breaks = max(line_length_to_best.values())
print(breaks + [1])
Let's say I have a list of values: [0, 10, 20, 10, 0, 10, 20, 10, 0, ...]
Clearly there's periodicity. We see that there is a cycle every 5 entries. I want to measure the average periodicity, or the average number of entries it takes to complete a cycle, within the list above.
This seems similar to measuring autoocorrelation but I don't know where to begin to get some sort of measure of the "frequency" or "periodicity", aka how fast a cycle is completed.
Minimal version:
a=[0, 10, 20, 10, 0, 10, 20, 10, 0, 10, 20]
n=len(a)
# The idea is to compare the repeated subset of the array with the original array
# while keeping the sizes equal
periods = [i for i in range(2,n//2+1) if a[:i]*(n//i)==a[:n - n % i]]
print('Min period=',periods[0], '\n',a[:periods[0]])
Output:
Min period: 4
[0, 10, 20, 10]
for-loop version:
Here is the same idea with for-loop just to make it more clear:
a=[0, 10, 20, 10, 0, 10, 20, 10, 0, 10, 20]
n = len(a)
periods=[]
for i in range(2, n // 2 + 1): # cycle's max length = 1/2 of sequence
m = n // i
word = a[:i]
repeated_word = [a[:i]*m][0]
same_size_array = a[:len(repeated_word)]
isCycle = repeated_word == same_size_array
if isCycle:
periods.append(i)
print(
'%s-char word\t' % i,word,
'\nRepeated word\t',repeated_word,
'\nSame size array\t',same_size_array,
'\nEqual(a Cycle)?\t',isCycle
,'\n'
)
period = periods[0] # shortest cycle
print('Min period:',period,'\n',a[:period])
Output (long version):
2-char word [0, 10]
Repeated word [0, 10, 0, 10, 0, 10, 0, 10, 0, 10]
Same size array [0, 10, 20, 10, 0, 10, 20, 10, 0, 10]
Equal(a Cycle)? False
3-char word [0, 10, 20]
Repeated word [0, 10, 20, 0, 10, 20, 0, 10, 20]
Same size array [0, 10, 20, 10, 0, 10, 20, 10, 0]
Equal(a Cycle)? False
4-char word [0, 10, 20, 10]
Repeated word [0, 10, 20, 10, 0, 10, 20, 10]
Same size array [0, 10, 20, 10, 0, 10, 20, 10]
Equal(a Cycle)? True
5-char word [0, 10, 20, 10, 0]
Repeated word [0, 10, 20, 10, 0, 0, 10, 20, 10, 0]
Same size array [0, 10, 20, 10, 0, 10, 20, 10, 0, 10]
Equal(a Cycle)? False
Min period: 4
[0, 10, 20, 10]
The "pure" average periodicity will necessarily be equivalent to the length of the list divided by the count of occurrences of an element.
We can also account for first and last appearances, and use that in our calculation, although this may affect you calculation in ways that you might not want:
from collections import Counter
values = [0, 10, 20, 10, 0, 10, 20, 10, 0]
counts = Counter(values)
periodicities = dict()
r_values = values[::-1]
for k, v in counts.items():
print(r_values.index(k), values.index(k))
periodicities[k] = (len(values) - r_values.index(k) - values.index(k) + 1) / v
print(periodicities)
Result:
{
0: 3.3333333333333335,
10: 2.0,
20: 3.0
}
Note: I'm assuming you're referring to exact periodicity rather than some measure of autocorrelation. E.g., [1, 5, 8, 1, 5, 8.0000000001] would have a period of 6 rather than 3.
This is by no means optimal, but in a pinch anyone can brute force a solution that looks something like the following.
def period(L):
n = len(L)
for i in range(1, n):
if n%i:
# save a little work if `i` isn't a factor of `n`
continue
if all(L[k:k+i]==L[:i] for k in range(0, n, i)):
# `L` is just `L[:i]*x` for some `x`, and since `i` is
# increasing this must be the smallest `i` where that
# is true
return i
# if no factor suffices, the smallest generator is the entire list
return n
With a little more effort we can get linear performance rather than quadratic. Optimizing it further is left as an exercise for somebody who isn't me.
def period(L):
if not L:
return 0
guess = 1
for i, x in enumerate(L[1:], 1):
if x != L[i%guess]:
"""
We know for certain the period is not `guess`. Moreover, if we've
gotten this far we've ruled out every option less than `guess`.
Additionally, no multiple of `guess` suffices because the fact
that `L` consists of repetitions of width `guess` up till now means
that `i%(t*guess)!=x` for any `t` so that `t*guess<i`. Interestingly,
that's the precisely the observation required to conclude
`guess >= i+1`; there is some positive integer multiple of `guess`
so that `L[:i+1]` consists of a copy of that width and some number
of elements that are identical to that copy up to and excluding `x`.
Since `L[:i+1]` has that structure, no width between that multiple
of `guess` and `i` can generate `L`. Hence, the answer is at least
`i+1`.
"""
guess = i+1
while len(L)%guess:
"""
Additionally, the answer must be a factor of `len(L)`. This
might superficially look quadratic because of the loop-in-a
-loop aspect to it, but since `1<=guess<=len(L)` and `guess`
is always increasing we can't possibly run this code more
than some linear number of times.
"""
guess += 1
"""
If we've gotten this far, `guess` is a factor of `L`, and it is
exactly as wide as all the elements we've seen so far. If we
continue iterating through `L` and find that it is just a bunch
of copies of this initial segment then we'll be done. Otherwise,
we'll find ourselves in this same if-statement and reset our
`guess` again.
"""
return guess
If you want all periods, then those are simply every multiple of the minimum period which are also factors of the total length. Supposing you have a way to compute the prime factorization or all positive factors (including 1 and the integer itself) of a positive integer, the following routine can get you those. Actually finding the factors of an integer is probably out of scope and is well-answered elsewhere.
def all_periods(minimum_period, collection_size):
p, n = minimum_period, collection_size
if p==0:
yield = 0
return
for f in positive_factors(n / p):
yield f * p
Initiate an empty list
interval = []
and use a recursive function, like so:
def check_for_interval(interval,list):
## step 1: add first list element into your interval
interval.append(list[0])
## step 2: remove that element from your list
list.pop(0)
## step 3: get the current content of your interval, plus the next
## element, and check if the concatenated content appears another time
## in the source list.
## first, make sure you got only strings in your list, for join to work
str_interval = []
for y in interval:
str_interval.append(str(y))
## attach the next element, which now is the first one of the list
## because you popped the "new" first one above
str_interval.append(str(list[0]))
## next, concatenate the list content as string, like so:
current_interval = ",".join(str_interval)
## now, transform the current remaining list (except the "new" first
## element cause added in your test string above) into a string of the
## exact same structure (elements separated by commas)
str_test = []
list_test = list[1:]
for z in list_test:
str_test.append(str(z))
## next,concatenate the list content as string, like so:
remaining_elements = ",".join(str_test)
## finally, check if the current_interval is present inside the
## remaining_elements. If yes
if remaining_elements.find(current_interval) != -1:
## check if the amount of remaining elements is equal to the amount
## of elements constituting the interval -1 at the moment, OR if the
## current_interval is found in the remaining elements, its
## starting index is equal to 0, and the len of str_test is a pair
## entire multiple of str_interval
check_list_test = remaining_elements.split(",")
check_list_interval = current_interval.split(",")
if (len(str_interval) == len(str_test)) or (remaining_elements.find(current_interval) == 0 and len(str_test) % len(str_interval) == 0 and (len(str_test) / len(str_interval)) % 2 == 0 and (len(check_list_test) / len(check_list_interval)) * check_list_interval == check_list_test):
## If yes, attach the "new" first element of the list to the interval
## (that last element was included in str_interval, but is not yet
## present in interval)
interval.append(list[0])
## and print the output
print("your interval is: " + str(interval))
else:
## otherwise, call the function recursively
check_for_interval(interval,list)
else:
## when the current interval is not found in the remaining elements,
## and the source list has been fully iterated (str_test's length
## == 0), this also means that we've found our interval
if len(str_test) == 0:
## add the last list element into the interval
interval.append(list[0])
print("your interval is: " + str(interval))
else:
## In all other cases, again call the function recursively
check_for_interval(interval,list)
OPTIMIZED CODE ONLY, WITHOUT COMMENTS
def int_to_str_list(source):
new_str_list = []
for z in source:
new_str_list.append(str(z))
return new_str_list
def check_for_interval(interval,list):
interval.append(list[0])
list.pop(0)
str_interval = int_to_str_list(interval)
str_interval.append(str(list[0]))
current_interval = ",".join(str_interval)
str_test = int_to_str_list(list[1:])
remaining_elements = ",".join(str_test)
str_exam = remaining_elements.find(current_interval)
if str_exam != -1:
interval_size = len(str_interval)
remaining_size = len(str_test)
rem_div_inter = remaining_size / interval_size
if (interval_size == remaining_size) or (str_exam == 0 and remaining_size % interval_size == 0 and rem_div_inter % 2 == 0 and rem_div_inter * str_interval == str_test):
interval.append(list[0])
print("your interval is: " + str(interval))
else:
check_for_interval(interval,list)
else:
if len(str_test) == 0 :
interval.append(list[0])
print("your interval is: " + str(interval))
else:
check_for_interval(interval,list)
To do what you want, simply run your function after initiating []
interval = []
check_for_interval(interval,list)
should work for pretty much any case, delivering you the interval as output.
Here is one way to approach this problem. Basically, you iterate from 2 to len(lst)//2 + 1 and check if the first n elements matches every next n elements, return n if true. If no match is found, return len(lst)
def get_periodicity(lst):
t = len(lst)
for n in range(2, t//2 + 1):
for p in range(1, t//n):
if lst[:n] != lst[n*p:n*p+n]:
break
else:
rem = t%n
if not rem or lst[-rem:] == lst[:rem]:
return n
else:
return t
Tests
>>> get_periodicity([0, 10, 20, 10, 0, 10, 20, 10, 0, 10, 20])
4
>>> get_periodicity([1,1,2,1,1,2,1,1,2,1,1,2])
3
>>> get_periodicity([1,1,2,1,1,2,1,1,2,1,1,2,3])
13
Given an array of random integers
N = [1,...,n]
I need to find min sum of two consecutive values using divide and conquer.
What is not working here but my IQ?
def minSum(array):
if len(array) < 2:
return array[0]+array[1]
if (len(a)%2) != 0:
mid = int(len(array)/2)
leftArray = array[:mid]
rightArray = array[mid+1:]
return min(minSum(leftArray),minSum(rightArray),crossSum(array,mid))
else:
mid = int(len(array)/2)
leftArray = array[:mid]
rightArray = array[mid:]
return min(minSum(leftArray), minSum(rightArray), array[mid]+array[mid+1])
def crossSum(array,mid):
return min(array[mid-1]+array[mid],array[mid]+array[mid+1])
The main problem seems to be that the first condition is wrong: If len(array) < 2, then the following line is bound to raise an IndexError. Also, a is not defined. I assume that that's the name of the array in the outer scope, thus this does not raise an exception but just silently uses the wrong array. Apart from that, the function seems to more-or-less work (did not test it thoroughly, though.
However, you do not really need to check whether the array has odd or even length, you can just use the same code for both cases, making the crossSum function unneccesary. Also, it is kind of confusing that the function for returning the min sum is called maxSum. If you really want a divide-and-conquer approach, try this:
def minSum(array):
if len(array) < 2:
return 10**100
elif len(array) == 2:
return array[0]+array[1]
else:
# len >= 3 -> both halves guaranteed non-empty
mid = len(array) // 2
leftArray = array[:mid]
rightArray = array[mid:]
return min(minSum(leftArray),
minSum(rightArray),
leftArray[-1] + rightArray[0])
import random
lst = [random.randint(1, 10) for _ in range(20)]
r = minSum(lst)
print(lst)
print(r)
Random example output:
[1, 5, 6, 4, 1, 2, 2, 10, 7, 10, 8, 4, 9, 5, 7, 6, 5, 1, 4, 9]
3
However, a simple loop would be much better suited for the problem:
def minSum(array):
return min(array[i-1] + array[i] for i in range(1, len(array)))
For example, the digits of 123431 and 4577852 increase and then decrease. I wrote a code that breaks the numbers into a list and is able to tell if all of the digits increase or if all of the digits decrease, but I don't know how to check for digits increasing then decreasing. How do I extend this?
x = int(input("Please enter a number: "))
y = [int(d) for d in str(x)]
def isDecreasing(y):
for i in range(len(y) - 1):
if y[i] < y[i + 1]:
return False
return True
if isDecreasing(y) == True or sorted(y) == y:
print("Yes")
Find the maximum element.
Break the list into two pieces at that location.
Check that the first piece is increasing, the second decreasing.
For your second example, 4577852, you find the largest element, 8.
Break the list in two: 4577 and 852 (the 8 can go in either list, both, or neither).
Check that 4577 is increasing (okay) and 852 is decreasing (also okay).
Is that enough to get you to a solution?
Seems like a good opportunity to learn about using itertools and generator pipelines. First we make a few simple, decoupled, and reusable components:
from itertools import tee, groupby
def digits(n):
"""420 -> 4, 2, 0"""
for char in str(n):
yield int(char)
def pairwise(iterable):
"""s -> (s0,s1), (s1,s2), (s2, s3), ..."""
a, b = tee(iterable)
next(b, None)
return zip(a, b)
def deltas(pairs):
"""2 5 3 4 -> 3, -2, 1"""
for left, right in pairs:
yield right - left
def directions(deltas):
"""3 2 2 5 6 -> -1, 0, 1, 1"""
for delta in deltas:
yield -1 if delta < 0 else +1 if delta > 0 else 0
def deduper(directions):
"""3 2 2 5 6 2 2 2 -> 3, 2, 5, 6, 2"""
for key, group in groupby(directions):
yield key
Then we put the pieces together to solve the wider problem of detecting an "increasing then decreasing number":
from itertools import zip_longest
def is_inc_dec(stream, expected=(+1, -1)):
stream = pairwise(stream)
stream = deltas(stream)
stream = directions(stream)
stream = deduper(stream)
for actual, expected in zip_longest(stream, expected):
if actual != expected or actual is None or expected is None:
return False
else:
return True
Usage is like this:
>>> stream = digits(123431)
>>> is_inc_dec(stream)
True
This solution will short-circuit correctly for a number like:
121111111111111111111111111111111111111111111111111...2
I've addressed only the "strictly increasing, and then strictly decreasing" number case. Since this sounds like it might be your homework, I'll leave it as an exercise for you to adapt the code for the "non-decreasing and then non-increasing" case which is mentioned in the question title.
Split the list at the maximum value, then take the min/ max of the diff of each side:
import numpy as np
test1 = [1, 2, 3, 4, 5, 8, 7, 3, 1, 0]
test2 = [1, 2, 3, 4, 5, 8, 7, 3, 1, 0, 2, 5]
test3 = [7, 1, 2, 3, 4, 5, 8, 7, 3, 1, 0]
test4 = [1, 2, 3, 4, 5, 8, 8, 7, 3, 1, 0]
def incdec_test(x):
i = np.array(x).argmax()
return (np.diff(x[0:i]).min() >= 0) and (np.diff(x[i:-1]).max() <= 0)
for test in [test1, test2, test3, test4]:
print 'increase then decrease = {}'.format(incdec_test(test))
Results:
increase then decrease = True
increase then decrease = False
increase then decrease = False
increase then decrease = False
How can I randomly shuffle a list so that none of the elements remains in its original position?
In other words, given a list A with distinct elements, I'd like to generate a permutation B of it so that
this permutation is random
and for each n, a[n] != b[n]
e.g.
a = [1,2,3,4]
b = [4,1,2,3] # good
b = [4,2,1,3] # good
a = [1,2,3,4]
x = [2,4,3,1] # bad
I don't know the proper term for such a permutation (is it "total"?) thus having a hard time googling. The correct term appears to be "derangement".
After some research I was able to implement the "early refusal" algorithm as described e.g. in this paper [1]. It goes like this:
import random
def random_derangement(n):
while True:
v = [i for i in range(n)]
for j in range(n - 1, -1, -1):
p = random.randint(0, j)
if v[p] == j:
break
else:
v[j], v[p] = v[p], v[j]
else:
if v[0] != 0:
return tuple(v)
The idea is: we keep shuffling the array, once we find that the permutation we're working on is not valid (v[i]==i), we break and start from scratch.
A quick test shows that this algorithm generates all derangements uniformly:
N = 4
# enumerate all derangements for testing
import itertools
counter = {}
for p in itertools.permutations(range(N)):
if all(p[i] != i for i in p):
counter[p] = 0
# make M probes for each derangement
M = 5000
for _ in range(M*len(counter)):
# generate a random derangement
p = random_derangement(N)
# is it really?
assert p in counter
# ok, record it
counter[p] += 1
# the distribution looks uniform
for p, c in sorted(counter.items()):
print p, c
Results:
(1, 0, 3, 2) 4934
(1, 2, 3, 0) 4952
(1, 3, 0, 2) 4980
(2, 0, 3, 1) 5054
(2, 3, 0, 1) 5032
(2, 3, 1, 0) 5053
(3, 0, 1, 2) 4951
(3, 2, 0, 1) 5048
(3, 2, 1, 0) 4996
I choose this algorithm for simplicity, this presentation [2] briefly outlines other ideas.
References:
[1] An analysis of a simple algorithm for random derangements. Merlini, Sprugnoli, Verri. WSPC Proceedings, 2007.
[2] Generating random derangements. Martínez, Panholzer, Prodinger.
Such permutations are called derangements. In practice you can just try random permutations until hitting a derangement, their ratio approaches the inverse of 'e' as 'n' grows.
As a possible starting point, the Fisher-Yates shuffle goes like this.
def swap(xs, a, b):
xs[a], xs[b] = xs[b], xs[a]
def permute(xs):
for a in xrange(len(xs)):
b = random.choice(xrange(a, len(xs)))
swap(xs, a, b)
Perhaps this will do the trick?
def derange(xs):
for a in xrange(len(xs) - 1):
b = random.choice(xrange(a + 1, len(xs) - 1))
swap(xs, a, b)
swap(len(xs) - 1, random.choice(xrange(n - 1))
Here's the version described by Vatine:
def derange(xs):
for a in xrange(1, len(xs)):
b = random.choice(xrange(0, a))
swap(xs, a, b)
return xs
A quick statistical test:
from collections import Counter
def test(n):
derangements = (tuple(derange(range(n))) for _ in xrange(10000))
for k,v in Counter(derangements).iteritems():
print('{} {}').format(k, v)
test(4):
(1, 3, 0, 2) 1665
(2, 0, 3, 1) 1702
(3, 2, 0, 1) 1636
(1, 2, 3, 0) 1632
(3, 0, 1, 2) 1694
(2, 3, 1, 0) 1671
This does appear uniform over its range, and it has the nice property that each element has an equal chance to appear in each allowed slot.
But unfortunately it doesn't include all of the derangements. There are 9 derangements of size 4. (The formula and an example for n=4 are given on the Wikipedia article).
This should work
import random
totalrandom = False
array = [1, 2, 3, 4]
it = 0
while totalrandom == False:
it += 1
shuffledArray = sorted(array, key=lambda k: random.random())
total = 0
for i in array:
if array[i-1] != shuffledArray[i-1]: total += 1
if total == 4:
totalrandom = True
if it > 10*len(array):
print("'Total random' shuffle impossible")
exit()
print(shuffledArray)
Note the variable it which exits the code if too many iterations are called. This accounts for arrays such as [1, 1, 1] or [3]
EDIT
Turns out that if you're using this with large arrays (bigger than 15 or so), it will be CPU intensive. Using a randomly generated 100 element array and upping it to len(array)**3, it takes my Samsung Galaxy S4 a long time to solve.
EDIT 2
After about 1200 seconds (20 minutes), the program ended saying 'Total Random shuffle impossible'. For large arrays, you need a very large number of permutations... Say len(array)**10 or something.
Code:
import random, time
totalrandom = False
array = []
it = 0
for i in range(1, 100):
array.append(random.randint(1, 6))
start = time.time()
while totalrandom == False:
it += 1
shuffledArray = sorted(array, key=lambda k: random.random())
total = 0
for i in array:
if array[i-1] != shuffledArray[i-1]: total += 1
if total == 4:
totalrandom = True
if it > len(array)**3:
end = time.time()
print(end-start)
print("'Total random' shuffle impossible")
exit()
end = time.time()
print(end-start)
print(shuffledArray)
Here is a smaller one, with pythonic syntax -
import random
def derange(s):
d=s[:]
while any([a==b for a,b in zip(d,s)]):random.shuffle(d)
return d
All it does is shuffles the list until there is no element-wise match. Also, be careful that it'll run forever if a list that cannot be deranged is passed.It happens when there are duplicates. To remove duplicates simply call the function like this derange(list(set(my_list_to_be_deranged))).
import random
a=[1,2,3,4]
c=[]
i=0
while i < len(a):
while 1:
k=random.choice(a)
#print k,a[i]
if k==a[i]:
pass
else:
if k not in c:
if i==len(a)-2:
if a[len(a)-1] not in c:
if k==a[len(a)-1]:
c.append(k)
break
else:
c.append(k)
break
else:
c.append(k)
break
i=i+1
print c
A quick way is to try to shuffle your list until you reach that state. You simply try to shuffle your list until you are left with a list that satisfies your condition.
import random
import copy
def is_derangement(l_original, l_proposal):
return all([l_original[i] != item for i, item in enumerate(l_proposal)])
l_original = [1, 2, 3, 4, 5]
l_proposal = copy.copy(l_original)
while not is_derangement(l_original, l_proposal):
random.shuffle(l_proposal)
print(l_proposal)