Modifying references to lists - python

For an algorithm I'm benchmarking I need to test some portion of a list (which could be very long, but is filled with 0's mostly and the occasional 1). The idea is that in a list of n items, with d of them being of interest, in expectation each is defective with probability d/n. So, check a group of size d/n (it's defined in terms of the floor and log functions for information theoretic reasons - it makes the analysis of the algorithm easier).
Algorithm:
1./ If n <= 2*d -2 (ie more than half the list is filled with 1s) just look at each item in turn
2./ If n > 2*d -2: Check a group of size aplha (= floor(binarylog(l/d), l = n - d + 1, d = number of 1s). If there is a 1, do binary search on the group to find the defective and set d = d - 1 and n = n - 1 - x (x = size of the group minus the defective). If there isn't a one, set n = n - groupSize and go to 1 (i.e. check the rest of the list).
However, when populating the list with 10 1s in random places, the algorithm find all but a single 1 and then continues to loop whilst checking an empty list.
I think the problem is that when discarding a group containing all 0s I'm not correctly modifying the reference that says where to start for the next round, and this is causing my algorithm to fail.
Here is the relevant part of the function:
import math
def binary_search(inList):
low = 0
high = len(inList)
while low < high:
mid = (low + high) // 2
upper = inList[mid:high]
lower = inList[low:mid]
if any(lower):
high = mid
elif any(upper):
low = mid + 1
elif mid == 1:
return mid
else:
# Neither side has a 1
return -1
return mid
def HGBSA(inList, num_defectives):
n = len(inList)
defectives = []
#initialising the start of the group to be tested
start = 0
while num_defectives > 0:
defective = 0
if(n <= (2*num_defectives - 2)):
for i in inList:
if i == 1:
num_defectives = num_defectives - 1
n = n - 1
defectives.append(i)
else:
#params to determine size of group
l = n - num_defectives + 1
alpha = int(math.floor(math.log(l/num_defectives, 2)))
groupSize = 2**alpha
end = start + groupSize
group = inList[start:end]
#print(groupSize)
#print(group)
if any(group):
defective = binary_search(group)
defective = start + defective
defectives.append(defective)
undefectives = [s for s in group if s != 1]
n = n - 1 - len(undefectives)
num_defectives = num_defectives - 1
print(defectives)
else:
n = n - groupSize
start = start + groupSize
print(defectives)
return defectives
Also here are the tests that the function currently passes:
from GroupTesting import HGBSA
#idenitify a single defective
inlist = [0]*1024
inlist[123] = 1
assert HGBSA(inlist, 1) == [123]
#identify two defectives
inlist = [0]*1024
inlist[123] = 1
inlist[789] = 1
assert inlist[123] == 1
assert inlist[789] == 1
assert HGBSA(inlist, 2) == [123, 789]
zeros = [0]*1024
ones = [1, 101, 201, 301, 401, 501, 601, 701, 801, 901]
for val in ones:
zeros[val] = 1
assert HGBSA(zeros, 10) == ones
I.e. it finds a single 1, 2 and 10 1s deterministically placed in the list, but this test:
zeros = [0] * 1024
ones = [1] * 10
l = zeros + ones
shuffle(l)
where_the_ones_are = [i for i, x in enumerate(l) if x == 1]
assert HGBSA(l, 10) == where_the_ones_are
Has exposed the bug.
This test also fails with the code above
#identify two defectives next to each other
inlist = [0]*1024
inlist[123] = 1
inlist[124] = 1
assert GT(inlist, 2) == [123, 124]
The following modification (discarding a whole group if it is undefective, but only discarding the members of a group before the defective) passes the 'two next to each other' test, but not the '10 in a row' or random tests:
def HGBSA(inList, num_defectives):
n = len(inList)
defectives = []
#initialising the start of the group to be tested
start = 0
while num_defectives > 0:
defective = 0
if(n <= (2*num_defectives - 2)):
for i in inList:
if i == 1:
num_defectives = num_defectives - 1
n = n - 1
defectives.append(i)
else:
#params to determine size of group
l = n - num_defectives + 1
alpha = int(math.floor(math.log(l/num_defectives, 2)))
groupSize = 2**alpha
end = start + groupSize
group = inList[start:end]
#print(groupSize)
#print(group)
if any(group):
defective = binary_search(group)
defective = start + defective
defectives.append(defective)
undefectives = [s for s in group if s != 1 in range(0, groupSize//2)]
print(len(undefectives))
n = n - 1 - len(undefectives)
num_defectives = num_defectives - 1
start = start + defective + 1
#print(defectives)
else:
n = n - groupSize
start = start + groupSize
print(defectives)
return defectives
I.e. the problem is when there are multiple 1s in a group being tested, and after the first none are being detected. The best test for the code to pass, would be the 1s uniformly distributed at random throughout the list and all defectives are found.
Also, how would I create tests to catch this kind of error in future?

Your algorithm seemingly has worse performance than a linear scan.
A naïve algorithm would just scan a piece of list the size of d/n in O(d/n).
defectives = [index for (index, element) in enumerate(inList[start:end], start)]
Common sense says that you can't possibly detect positions of all 1s in a list without looking at every element of the list once, and there's no point in looking at it more that once.
Your "binary search" uses any multiple times, effectively scanning pieces of the list multiple times. Same applies to constructs like if any(group): ... [s for s in group if ...] which scan group twice, first time needlessly.
If you described the actual algorithm you're trying to implement, people could help troubleshoot it. From your code and your post, the algorithm is unclear. The fact that your HGBSA function is long and not exactly commented unfortunately does not help understanding.
Don't be afraid to tell people here the details of what your algorithm is doing and why; we're sort of computer geeks here, too, we're going to understand :)

Related

Find single number in pairs of unique numbers of a Python list in O(lg n)

I have a question for Divide and Conquering in programming algorithms. Suppose you are given a random integer list in Python which consists of:
Unique contiguous pairs of integers
A single integer somewhere in the list
And the conditions are exclusive, meaning while [2,2,1,1,3,3,4,5,5,6,6] is valid, these are not:
[2,2,2,2,3,3,4] (violates condition 1: because there are two pairs of 2s while there can only be a maximum of 1 pair of any number)
[1,4,4,5,5,6,6,1] (violates condition 1: because there is a pair of 1s but they are not contiguous).
[1,4,4,5,5,6,6,3] (violates condition 2: there are 2 single numbers, 1 and 3)
Now the question is can you find the 'single' number index in an O(lgn) algorithm?
My original jab is this:
def single_num(array, arr_max_len):
i = 0
while (i < arr_max_len):
if (arr_max_len - i == 1):
return i
elif (array[i] == array[i + 1]):
i = i + 2
else:
return i # don't have to worry about odd index because it will never happen
return None
However, the algorithm seems to run at O(n/2) time, which seems like the best it could do.
Even if I use divide and conquer, I don't think it's going to get better than O(n/2) time, unless there's some method that's beyond my scope of comprehension at the moment.
Anyone has any better idea, or can I arguably say, this is already in O(log n) time?
EDIT: It seems like Manuel has the best solution, if allowed Ill have some time to implement a solution myself for understanding, and then accept Manuel’s answer.
Solution
Just binary search the even indexes to find the first whose value differs from the next value.
from bisect import bisect
def single_num(a):
class E:
def __getitem__(_, i):
return a[2*i] != a[2*i+1]
return 2 * bisect(E(), False, 0, len(a)//2)
Explanation
Visualization of the virtual "list" E() that I'm searching on:
0 1 2 3 4 5 6 7 8 9 10 (indices)
a = [2, 2, 1, 1, 3, 3, 4, 5, 5, 6, 6]
E() = [False, False, False, True, True]
0 1 2 3 4 (indices)
In the beginning, the pairs match (so != results in False-values). Starting with the single number, the pairs don't match (so != returns True). Since False < True, that's a sorted list which bisect happily searches in.
Alternative implementation
Without bisect, if you're not yet tired of writing binary searches:
def single_num(a):
i, j = 0, len(a) // 2
while i < j:
m = (i + j) // 2
if a[2*m] == a[2*m+1]:
i = m + 1
else:
j = m
return 2*i
Sigh...
I wish bisect would support giving it a callable so I could just do return 2 * bisect(lambda i: a[2*i] != a[2*i+1], False, 0, len(a)//2). Ruby does, and it's perhaps the most frequent reason I sometimes solve coding problems with Ruby instead of Python.
Testing
Btw I tested both with all possible cases for up to 1000 pairs:
from random import random
for pairs in range(1001):
a = [x for _ in range(pairs) for x in [random()] * 2]
single = random()
assert len(set(a)) == pairs and single not in a
for i in range(0, 2*pairs+1, 2):
a.insert(i, single)
assert single_num(a) == i
a.pop(i)
A lg n algorithm is one in which you split the input into smaller parts, and discard some of the smaller part such that you have a smaller input to work with. Since this is a searching problem, the likely solution for a lg n time complexity is binary search, in which you split the input in half each time.
My approach is to start off with a few simple cases, to spot any patterns that I can make use of.
In the following examples, the largest integer is the target number.
# input size: 3
[1,1,2]
[2,1,1]
# input size: 5
[1,1,2,2,3]
[1,1,3,2,2]
[3,1,1,2,2]
# input size: 7
[1,1,2,2,3,3,4]
[1,1,2,2,4,3,3]
[1,1,4,2,2,3,3]
[4,1,1,2,2,3,3]
# input size: 9
[1,1,2,2,3,3,4,4,5]
[1,1,2,2,3,3,5,4,4]
[1,1,2,2,5,3,3,4,4]
[1,1,5,2,2,3,3,4,4]
[5,1,1,2,2,3,3,4,4]
You probably notice that the input size is always an odd number i.e. 2*x + 1.
Since this is a binary search, you can check if the middle number is your target number. If the middle number is the single number (if middle_number != left_number and middle_number != right_number), then you have found it. Otherwise, you have to search the left side or the right side of the input.
Notice that in the sample test cases above, in which the middle number is not the target number, there is a pattern between the middle number and its pair.
For input size 3 (2*1 + 1), if middle_number == left_number, the target number is on the right, and vice versa.
For input size 5 (2*2 + 1), if middle_number == left_number, the target number is on the left, and vice versa.
For input size 7 (2*3 + 1), if middle_number == left_number, the target number is on the right, and vice versa.
For input size 9 (2*4 + 1), if middle_number == left_number, the target number is on the left, and vice versa.
That means the parity of x in 2*x + 1 (the array length) affects whether to search the left or right side of the input: search the right if x is odd and search the left if x is even, if middle_number == left_number (and vice versa).
Base on all these information, you can come up with a recursive solution. Note that you have to ensure that the input size is odd in each recursive call. (Edit: Ensuring that input size is odd makes the code even more messy. You probably want to come up with a solution in which parity of input size does not matter.)
def find_single_number(array: list, start_index: int, end_index: int):
# base case: array length == 1
if start_index == end_index:
return start_index
middle_index = (start_index + end_index) // 2
# base case: found target
if array[middle_index] != array[middle_index - 1] and array[middle_index] != array[middle_index + 1]:
return middle_index
# make use of parity of array length to search left or right side
# end_index == array length - 1
x = (end_index - start_index) // 2
# ensure array length is odd
include_middle = (middle_index % 2 == 0)
if array[middle_index] == array[middle_index - 1]: # middle == number on its left
if x % 2 == 0: # x is even
# search left side
return find_single_number(
array,
start_index,
middle_index if include_middle else middle_index - 1
)
else: # x is odd
# search right side side
return find_single_number(
array,
middle_index if include_middle else middle_index + 1,
end_index,
)
else: # middle == number on its right
if x % 2 == 0: # x is even
# search right side side
return find_single_number(
array,
middle_index if include_middle else middle_index + 1,
end_index,
)
else: # x is odd
# search left side
return find_single_number(
array,
start_index,
middle_index if include_middle else middle_index - 1
)
# test out the code
if __name__ == '__main__':
array = [2,2,1,1,3,3,4,5,5,6,6] # target: 4 (index: 6)
print(find_single_number(array, 0, len(array) - 1))
array = [1,1,2] # target: 2 (index: 2)
print(find_single_number(array, 0, len(array) - 1))
array = [1,1,3,2,2] # target: 3 (index: 2)
print(find_single_number(array, 0, len(array) - 1))
array = [1,1,4,2,2,3,3] # target: 4 (index: 2)
print(find_single_number(array, 0, len(array) - 1))
array = [5,1,1,2,2,3,3,4,4] # target: 5 (index:0)
print(find_single_number(array, 0, len(array) - 1))
My solution is probably not the most efficient or elegant, but I hope my explanation helps you understand the approach towards tackling these kind of algorithmic problems.
Proof that it has a time complexity of O(lg n):
Let's assume that the most important operation is the comparison of the middle number against the left and right number (if array[middle_index] != array[middle_index - 1] and array[middle_index] != array[middle_index + 1]), and that it has a time cost of 1 unit. Let us refer to this comparison as the main comparison.
Let T be time cost of the algorithm.
Let n be the length of the array.
Since this solution involves recursion, there is a base case and recursive case.
For the base case (n = 1), it is just the main comparison, so:
T(1) = 1.
For the recursive case, the input is split in half (either left half or right half) each time; at the same time, there is one main comparison. So:
T(n) = T(n/2) + 1
Now, I know that the input size must always be odd, but let us assume that n = 2k for simplicity; the time complexity would still be the same.
We can rewrite T(n) = T(n/2) + 1 as:
T(2k) = T(2k-1) + 1
Also, T(1) = 1 is:
T(20) = 1
When we expand T(2k) = T(2k-1) + 1, we get:
T(2k)
= T(2k-1) + 1
= [T(2k-2) + 1] + 1 = T(2k-2) + 2
= [T(2k-3) + 1] + 2 = T(2k-3) + 3
= [T(2k-4) + 1] + 3 = T(2k-4) + 4
= ...(repeat until k)
= T(2k-k) + k = T(20) + k = k + 1
Since n = 2k, that means k = log2 n.
Substituting n back in, we get:
T(n) = log2 n + 1
1 is a constant so it can be dropped; same goes for the base of the log operation.
Therefore, the upperbound of the time complexity of the algorithm is:
T(n) = lg n

is there a method on how to count the longest substring or sublist that are the same when mirrored

so i need to find whats the longest sublist that can be mirrored, knowing the number of element
ex:
n = 5
my_list = [1,2,3,2,1]
heres my code:
n = int(input())
my_list = list(map(int, input().split()))
c = 0
s1 = my_list
x = 0
i = 0
while i < n:
s2 = s1[i:]
if s2 == s2[::-1]:
if c <= len(s2):
c = len(s2)
if i >= n-1:
i = 0
n = n - 1
s1 = s1[:-1]
i += 1
print(c)
as we see the list is the same when mirrored, but when n = 10 and my_list = [1,2,3,2,1,332,6597,6416,614,31] the result is 3 instead of the expected 5.
My solution would be splitting the array in each iteration into a left and a right array, and then reversing the left array.
Next, compare each element from each array and increment the length variable by one while the elements are the same.
def longest_subarr(a):
longest_exclude = 0
for i in range(1, len(a) - 1):
# this excludes a[i] as the root
left = a[:i][::-1]
# this also excludes a[i], needs to consider this in calculation later
right = a[i + 1:]
max_length = min(len(left), len(right))
length = 0
while(length < max_length and left[length] == right[length]):
length += 1
longest_exclude = max(longest_exclude, length)
# times 2 because the current longest is for the half of the array
# plus 1 to include to root
longest_exclude = longest_exclude * 2 + 1
longest_include = 0
for i in range(1, len(a)):
# this excludes a[i] as the root
left = a[:i][::-1]
# this includes a[i]
right = a[i:]
max_length = min(len(left), len(right))
length = 0
while(length < max_length and left[length] == right[length]):
length += 1
longest_include = max(longest_include, length)
# times 2 because the current longest is for the half of the array
longest_include *= 2
return max(longest_exclude, longest_include)
print(longest_subarr([1, 4, 3, 5, 3, 4, 1]))
print(longest_subarr([1, 4, 3, 5, 5, 3, 4, 1]))
print(longest_subarr([1, 3, 2, 2, 1]))
This covers test case for odd-length sub-array like [a, b, a] and even-length sub-array [a, b, b, a].
Since you need the longest sequence that can be mirrored, here is a simple O(n^2) approach for this.
Go to each index, consider it as the center, and expand towards both left and right, one step at a time, if the numbers are equal. Or else break, and move onto the next index.
def longest_mirror(my_array):
maxLength = 1
start = 0
length = len(my_array)
low = 0
high = 0
# One by one consider every character as center point of mirrored subarray
for i in range(1, length):
# checking for even length subarrays
low = i - 1
high = i
while low >= 0 and high < length and my_array[low] == my_array[high]:
if high - low + 1 > maxLength:
start = low
maxLength = high - low + 1
low -= 1
high += 1
# checking for even length subarrays
low = i - 1
high = i + 1
while low >= 0 and high < length and my_array[low] == my_array[high]:
if high - low + 1 > maxLength:
start = low
maxLength = high - low + 1
low -= 1
high += 1
return maxLength

Fibonacci series in bit string

I am working on Fibonacci series but in bit string which can be represented as:
f(0)=0;
f(1)=1;
f(2)=10;
f(3)=101;
f(4)=10110;
f(5)=10110101;
Secondly, I have a pattern for example '10' and want to count how many times this occurs in particular series, for example, the Fibonacci series for 5 is '101101101' so '10' occur 3 times.
my code is running correctly without error but the problem is that it cannot run for more than the value of n=45 I want to run n=100
can anyone help? I only want to calculate the count of occurrence
n=5
fibonacci_numbers = ['0', '1']
for i in range(1,n):
fibonacci_numbers.append(fibonacci_numbers[i]+fibonacci_numbers[i-1])
#print(fibonacci_numbers[-1])
print(fibonacci_numbers[-1])
nStr = str (fibonacci_numbers[-1])
pattern = '10'
count = 0
flag = True
start = 0
while flag:
a = nStr.find(pattern, start)
if a == -1:
flag = False
else:
count += 1
start = a + 1
print(count)
This is a fun one! The trick is that you don't actually need that giant bit string, just the number of 10s it contains and the edges. This solution runs in O(n) time and O(1) space.
from typing import NamedTuple
class FibString(NamedTuple):
"""First digit, last digit, and the number of 10s in between."""
first: int
tens: int
last: int
def count_fib_string_tens(n: int) -> int:
"""Count the number of 10s in a n-'Fibonacci bitstring'."""
def combine(b: FibString, a: FibString) -> FibString:
"""Combine two FibStrings."""
tens = b.tens + a.tens
# mind the edges!
if b.last == 1 and a.first == 0:
tens += 1
return FibString(b.first, tens, a.last)
# First two values are 0 and 1 (tens=0 for both)
a, b = FibString(0, 0, 0), FibString(1, 0, 1)
for _ in range(1, n):
a, b = b, combine(b, a)
return b.tens # tada!
I tested this against your original implementation and sure enough it produces the same answers for all values that the original function is able to calculate (but it's about eight orders of magnitude faster by the time you get up to n=40). The answer for n=100 is 218922995834555169026 and it took 0.1ms to calculate using this method.
The nice thing about the Fibonacci sequence that will solve your issue is that you only need the last two values of the sequence. 10110 is made by combining 101 and 10. After that 10 is no longer needed. So instead of appending, you can just keep the two values. Here is what I've done:
n=45
fibonacci_numbers = ['0', '1']
for i in range(1,n):
temp = fibonacci_numbers[1]
fibonacci_numbers[1] = fibonacci_numbers[1] + fibonacci_numbers[0]
fibonacci_numbers[0] = temp
Note that it still uses a decent amount of memory, but it didn't give me a memory error (it does take a bit of time to run though).
I also wasn't able to print the full string as I got an OSError [Errno 5] Input/Output error but it can still count and print that output.
For larger numbers, storing as a string is going to quickly cause a memory issue. In that case, I'd suggest doing the fibonacci sequence with plain integers and then converting to bits. See here for tips on binary conversion.
While the regular fibonacci sequence doesn't work in a direct sense, consider that 10 is 2 and 101 is 5. 5+2 doesn't work - you want 10110 or an or operation 10100 | 10 yielding 22; so if you shift one by the length of the other, you can get the result. See for example
x = 5
y = 2
(x << 2) | y
>> 22
Shifting x by the number of bits representing y and then doing a bitwise or with | solves the issue. Python summarizes these bitwise operations well here. All that's left for you to do is determine how many bits to shift and implement this into your for loop!
For really large n you will still have a memory issue shown in the plot:
'
Finally i got the answer but can someone explain it briefly why it is working
def count(p, n):
count = 0
i = n.find(p)
while i != -1:
n = n[i + 1:]
i = n.find(p)
count += 1
return count
def occurence(p, n):
a1 = "1"
a0 = "0"
lp = len(p)
i = 1
if n <= 5:
return count(p, atring(n))
while lp > len(a1):
temp = a1
a1 += a0
a0 = temp
i += 1
if i >= n:
return count(p, a1)
fn = a1[:lp - 1]
if -lp + 1 < 0:
ln = a1[-lp + 1:]
else:
ln = ""
countn = count(p, a1)
a1 = a1 + a0
i += 1
if -lp + 1 < 0:
lnp1 = a1[-lp + 1:]
else:
lnp1 = ""
k = 0
countn1 = count(p, a1)
for j in range(i + 1, n + 1):
temp = countn1
countn1 += countn
countn = temp
if k % 2 == 0:
string = lnp1 + fn
else:
string = ln + fn
k += 1
countn1 += count(p, string)
return countn1
def atring(n):
a0 = "0"
a1 = "1"
if n == 0 or n == 1:
return str(n)
for i in range(2, n + 1):
temp = a1
a1 += a0
a0 = temp
return a1
def fn():
a = 100
p = '10'
print( occurence(p, a))
if __name__ == "__main__":
fn()

Backtracing the longest palindromic subsequence

I modified the code from Geeks for Geeks to backtrace the actual subsequence, not only its length. But when I backtrace and get to the end where I can put an arbitrary character to the middle of the palindrome, I find my solution to be sloppy and not 'Pythonic'. Can someone please help me?
This piece smells particularly bad(if it works correctly at all):
if length_matrix[start][end] == 1 and substr_length >= 0:
middle = sequence[start]
Here is the forward pass:
def calc_subsequence_lengths(sequence):
n = len(sequence)
# Create a table to store results of subproblems
palindrome_lengths = np.zeros((n, n))
# Strings of length 1 are palindrome of length 1
np.fill_diagonal(palindrome_lengths, 1)
for substr_length in range(2, n + 1):
for i in range(n - substr_length + 1):
j = i + substr_length - 1
if sequence[i] == sequence[j] and substr_length == 2:
palindrome_lengths[i][j] = 2
elif sequence[i] == sequence[j]:
palindrome_lengths[i][j] = palindrome_lengths[i + 1][j - 1] + 2
else:
palindrome_lengths[i][j] = max(palindrome_lengths[i][j - 1],
palindrome_lengths[i + 1][j])
return palindrome_lengths
And here is the traceback:
def restore_palindrome(length_matrix, sequence):
palindrome_left = ''
middle = ''
n, n = np.shape(length_matrix)
# start in the north-eastern corner of the matrix
substr_length, end = n - 1, n-1
# traceback
while substr_length > 0 and end > 1:
start = end - substr_length
# if possible, go left
if length_matrix[start][end] == (length_matrix[start][end - 1]):
substr_length -= 1
end -= 1
# the left cell == current - 2, but the lower is the same as current, go down
elif length_matrix[start][end] == (length_matrix[start + 1][end]):
substr_length -= 1
# both left and lower == current - 2, go south-west
else:
palindrome_left += sequence[start]
substr_length -= 2
end -= 1
if length_matrix[start][end] == 1 and substr_length >= 0:
middle = sequence[start+1]
result = ''.join(palindrome_left) + middle + ''.join(palindrome_left[::-1])
return result, int(length_matrix[0][n-1])
Update
First off, the problem is to calculate the longest non-contiguous palindromic sequence (as stated in the article I referred to). For the sequence BBABCBCAB, the output should be BABCBAB
Secondly, as I have pointed out, I'm building upon an existing DP solution which works in O(N^2) time and space. It calculates the length just fine, so I need to backtrace the actual palindrome in the most elegant way, not sacrificing efficiency for elegance.

Using Python for quasi randomization

Here's the problem: I try to randomize n times a choice between two elements (let's say [0,1] -> 0 or 1), and my final list will have n/2 [0] + n/2 [1]. I tend to have this kind of result: [0 1 0 0 0 1 0 1 1 1 1 1 1 0 0, until n]: the problem is that I don't want to have serially 4 or 5 times the same number so often. I know that I could use a quasi randomisation procedure, but I don't know how to do so (I'm using Python).
To guarantee that there will be the same number of zeros and ones you can generate a list containing n/2 zeros and n/2 ones and shuffle it with random.shuffle.
For small n, if you aren't happy that the result passes your acceptance criteria (e.g. not too many consecutive equal numbers), shuffle again. Be aware that doing this reduces the randomness of the result, not increases it.
For larger n it will take too long to find a result that passes your criteria using this method (because most results will fail). Instead you could generate elements one at a time with these rules:
If you already generated 4 ones in a row the next number must be zero and vice versa.
Otherwise, if you need to generate x more ones and y more zeros, the chance of the next number being one is x/(x+y).
You can use random.shuffle to randomize a list.
import random
n = 100
seq = [0]*(n/2) + [1]*(n-n/2)
random.shuffle(seq)
Now you can run through the list and whenever you see a run that's too long, swap an element to break up the sequence. I don't have any code for that part yet.
Having 6 1's in a row isn't particularly improbable -- are you sure you're not getting what you want?
There's a simple Python interface for a uniformly distributed random number, is that what you're looking for?
Here's my take on it. The first two functions are the actual implementation and the last function is for testing it.
The key is the first function which looks at the last N elements of the list where N+1 is the limit of how many times you want a number to appear in a row. It counts the number of ones that occur and then returns 1 with (1 - N/n) probability where n is the amount of ones already present. Note that this probability is 0 in the case of N consecutive ones and 1 in the case of N consecutive zeros.
Like a true random selection, there is no guarantee that the ratio of ones and zeros will be the 1 but averaged out over thousands of runs, it does produce as many ones as zeros.
For longer lists, this will be better than repeatedly calling shuffle and checking that it satisfies your requirements.
import random
def next_value(selected):
# Mathematically, this isn't necessary but it accounts for
# potential problems with floating point numbers.
if selected.count(0) == 0:
return 0
elif selected.count(1) == 0:
return 1
N = len(selected)
selector = float(selected.count(1)) / N
if random.uniform(0, 1) > selector:
return 1
else:
return 0
def get_sequence(N, max_run):
lim = min(N, max_run - 1)
seq = [random.choice((1, 0)) for _ in xrange(lim)]
for _ in xrange(N - lim):
seq.append(next_value(seq[-max_run+1:]))
return seq
def test(N, max_run, test_count):
ones = 0.0
zeros = 0.0
for _ in xrange(test_count):
seq = get_sequence(N, max_run)
# Keep track of how many ones and zeros we're generating
zeros += seq.count(0)
ones += seq.count(1)
# Make sure that the max_run isn't violated.
counts = [0, 0]
for i in seq:
counts[i] += 1
counts[not i] = 0
if max_run in counts:
print seq
return
# Print the ratio of zeros to ones. This should be around 1.
print zeros/ones
test(200, 5, 10000)
Probably not the smartest way, but it works for "no sequential runs", while not generating the same number of 0s and 1s. See below for version that fits all requirements.
from random import choice
CHOICES = (1, 0)
def quasirandom(n, longest=3):
serial = 0
latest = 0
result = []
rappend = result.append
for i in xrange(n):
val = choice(CHOICES)
if latest == val:
serial += 1
else:
serial = 0
if serial >= longest:
val = CHOICES[val]
rappend(val)
latest = val
return result
print quasirandom(10)
print quasirandom(100)
This one below corrects the filtering shuffle idea and works correctly AFAICT, with the caveat that the very last numbers might form a run. Pass debug=True to check that the requirements are met.
from random import random
from itertools import groupby # For testing the result
try: xrange
except: xrange = range
def generate_quasirandom(values, n, longest=3, debug=False):
# Sanity check
if len(values) < 2 or longest < 1:
raise ValueError
# Create a list with n * [val]
source = []
sourcelen = len(values) * n
for val in values:
source += [val] * n
# For breaking runs
serial = 0
latest = None
for i in xrange(sourcelen):
# Pick something from source[:i]
j = int(random() * (sourcelen - i)) + i
if source[j] == latest:
serial += 1
if serial >= longest:
serial = 0
guard = 0
# We got a serial run, break it
while source[j] == latest:
j = int(random() * (sourcelen - i)) + i
guard += 1
# We just hit an infinit loop: there is no way to avoid a serial run
if guard > 10:
print("Unable to avoid serial run, disabling asserts.")
debug = False
break
else:
serial = 0
latest = source[j]
# Move the picked value to source[i:]
source[i], source[j] = source[j], source[i]
# More sanity checks
check_quasirandom(source, values, n, longest, debug)
return source
def check_quasirandom(shuffled, values, n, longest, debug):
counts = []
# We skip the last entries because breaking runs in them get too hairy
for val, count in groupby(shuffled):
counts.append(len(list(count)))
highest = max(counts)
print('Longest run: %d\nMax run lenght:%d' % (highest, longest))
# Invariants
assert len(shuffled) == len(values) * n
for val in values:
assert shuffled.count(val) == n
if debug:
# Only checked if we were able to avoid a sequential run >= longest
assert highest <= longest
for x in xrange(10, 1000):
generate_quasirandom((0, 1, 2, 3), 1000, x//10, debug=True)

Categories

Resources